Jobs#

Jobs are a way to schedule one-off or recurring pipelines runs in Orchest.

A job can run multiple iterations of the same Pipeline over time or small variations by using different parameters as inputs. For example, you could create a job which uses the same ETL pipeline but extracts data from a different data source for each Pipeline run.

List of jobs for a given Orchest project — The list of jobs for a given Orchest project.#

Jobs take a snapshot of your project directory when they are created. Each of the job’s pipeline runs copy the project directory snapshot and execute the files without changing the original snapshot. This means jobs run consistently throughout their entire lifetime.

Different Pipeline runs that are part of the same job are completely isolated from a scheduling perspective and do not affect each others’ state.

Note

💡 Write data and large artifacts to the /data directory. This helps to save disk space by not including them in the project directory snapshot. Alternatively, you can add the artifacts to .gitignore as the ignored patterns are not copied to the snapshot.

Tip

👉 Check out our short video tutorials:

Running a job in Orchest#

To create and run a job in Orchest, follow these instructions:

Click on Jobs in navigation bar.
Click the + new job button to create a new job.
Configure the job.
Press run job.

To inspect the result of your job; click on the job you just created, choose a specific Pipeline run (the one you want to inspect) and click on VIEW. The Pipeline is now opened in read-only mode giving you the opportunity to check the logs or to open the HTML version of you notebooks.

Note

💡 Upon job creation, Orchest (under the hood) takes a snapshot of the required environments. This way you can freely iterate on and update your existing environments without worrying about breaking your existing jobs.

Parametrizing pipelines and steps#

Jobs run a specific Pipeline for a given set of parameters. If you define multiple values for the same parameter, then the job will run the Pipeline once for every combination of parameter values. You can think of job parameters as a grid search, i.e. looping over all combinations of values for different parameters.

Note

💡 Unlike environment variables, you can define Pipeline and step level parameters with the same name without one (automatically) overwriting the other, you can access both values.

Specify job parameters with a file#

You can easily run a Pipeline for multiple parameter configurations by creating a job parameters file.

If you place the file in the same folder as your Pipeline file the job parameters file will automatically be detected when creating a job.

For a Pipeline called main.orchest the job parameters file should be named main.parameters.json, and be put in the same folder as the pipeline file (both in the project directory).

You can also select a file manually when creating a job.

The JSON file should be formatted as below. Note that wrapping the values in a list is required, even if you’re assigning just one parameter value to a key. It is allowed to omit keys you don’t want to specify.

{
  "pipeline_parameters": {
    "some_key": ["a", "list", "of", "values"]
  },
  "62a62810-336c-44c4-af6a-35228e8f2028": {
    "some_key": [1, 2, 3],
    "another_key": [1]
  }
}

You can find the step UUIDs in the pipeline file (e.g. main.orchest), pipelines are regular JSON files.

Running a parametrized job#

The procedure to run a parametrized job is very similar to running a job without any parameters. Once you have followed any of the procedures above to parametrize your Pipeline:

Make sure you have defined some parameters or you will only be able to schedule the Pipeline as is.
Click on Jobs in the left menu pane.
Click the + new job button to configure your job.
Choose a Job name and the Pipeline you want to run the job for.
Your default set of parameters are pre-loaded. By clicking on the values a JSON editor opens, allowing you to add additional values you would like the Pipeline to run for.
If you would like to schedule the job to run at a specific time have a look at Scheduling. In case you don’t want your job to run every combination of your parameter values, you can deselect them through the Pipeline runs option.
Press Run job.