How Orchest works

Orchest is powered by your filesystem. Upon launching, Orchest will mount a directory called the userdir. Its default location is orchest/orchest/userdir/. Inside this directory it will store the following files for each pipeline:

  • Your scripts that make up the pipeline, for example .ipynb files.
  • The Orchest Data passing SDK stores step outputs in the .data directory to pass data between pipeline steps.
  • Logs are stored in .logs to show STDOUT output from scripts in the pipeline view.
  • An autogenerated pipeline.json file that defines the properties of the pipeline and its steps. This includes: execution order, names, images, etc. Orchest needs this pipeline definition file to work.

Orchest runs as a collection of Docker containers and only stores a global configuration file. The location for this config is ~/.config/orchest/config.json for Unix based systems and %UserProfile%\.orchest\config.json for Windows.

Installing additional packages

Orchest runs all your pipeline step code scripts (.ipynb, .py, .R, .sh) in containers. The default images are based on the Jupyter Docker Stacks and come with a number of pre-installed packages.

We plan on supporting custom images and/or container commits, to avoid having to reinstall packages each time a pipeline step is run.

Installing additional Python packages

Execute commands inside the scripts to install the package before use.

For Jupyter notebooks you can run the following code in a cell:

!conda install <package name>

or for the pip packages run:

!pip install <package name>

Or directly from within Python (i.e. for Python scripts):

from pip._internal import main as pip

pip(['install', '--user', '<package name>'])

Installing additional R packages

R packages can be installed with the regular command:

install.packages("<package name>")