How Orchest works¶
Orchest is a fully containerized application and its runtime can be managed through the orchest
shell script. In the script you can see that the Docker socket /var/run/docker.sock
is mounted,
which Orchest requires in order order to dynamically spawn Docker containers when running pipelines.
Global configurations are stored at ~/.config/orchest/config.json
, for possible configuration
values see configuration.
Orchest is powered by your filesystem, there is no hidden magic. Upon launching, Orchest will mount
the content of the orchest/userdir/
directory, where orchest/
is the install directory from
GitHub, in the Docker containers. Giving you access to your scripts from within Orchest, but also
allowing you to structure and edit the files with any other editor such as VS Code!
Caution
The userdir/
directory not only contains your files and scripts, it also contains the state
(inside the userdir/.orchest/
directory) that Orchest needs to run. Touching the state can
result in, for example, losing job entries causing them to no longer show up in the UI.
The mental model in Orchest is centered around Projects. Within each project you get to create multiple pipelines through the Orchest UI, and every pipeline consists of pipeline steps that point to your scripts. Let’s take a look at the following directory structure of a project:
myproject
├── .orchest
│ ├── pipelines/
│ └── environments/
├── pipeline.orchest
├── prep.ipynb
└── training.py
Note
Again Orchest creates a .orchest/
directory to store state. In the .orchest/pipelines/
directory the passed data between steps is stored (per pipeline in data/
), if disk based data
passing is used instead of (the default) memory data passing, see data passing. Per pipeline (inside .orchest/pipelines/
) there is also a logs/
directory
containing the STDOUT of the scripts, the STDOUT can be inspected through the Orchest UI.
Tip
You should not put large files inside your project and instead use data sources or write to the special /data
directory (which is the mounted userdir/data/
directory that is shared between projects). Jobs create snapshots of
the project directory (for reproducibility reasons) and therefore would copy all the data.
The pipeline definition file pipeline.orchest
above defines the
structure of the pipeline. For example:

As you can see the pipeline steps point to the corresponding files: prep.ipynb
and
training.py
. These files are run inside their own isolated environments (as defined in
.orchest/environments/
) using containerization. In order to install additional packages or to
easily change the Docker image, see environments.
Note
We currently support Python, R and Julia.
Concepts¶
At Orchest we believe that Jupyter Notebooks thank their popularity to their interactive nature. It is great to get immediate feedback and actively inspect your results without having to run the entire script.
To facilitate a similar workflow within Orchest both JupyterLab and interactive pipeline runs get to directly change your notebook files. Lets explain this with an
example. Assume your pipeline is just a single .ipynb
file (run inside its own environment) with
the following code:
print("Hello World!")
If you now, without having executed this cell in JupyterLab, go to the pipeline editor, select the
step and press Run selected steps then you will see in JupyterLab that the cell has outputted
"Hello World!"
without having run it in JupyterLab.
Note
Even though both interactive pipeline runs and JupyterLab change your files, they do not share the same kernel! They do of course share the same environment.
Tip
Make sure to save your notebooks before running an interactive pipeline run, otherwise JupyterLab will prompt you with a “File Changed” pop-up whether you want to “Overwrite” or “Revert” on the next save. “Overwrite” would let you keep the changes, however, it would then overwrite the changes made by the interactive run.