Projects#

A project is the main container for organizing related pipelines, jobs, environments and code in Orchest.

A project is based on a git repository. For example, a Project might be organized like:

.
├── .git/
├── .orchest
│   ├── environments/
│   └── pipelines/
├── california_housing.orchest
├── collect-results.ipynb
└── get-data.py

Projects also contain jobs, however, these are not stored in the project filesystem.

You can access project files in your code running inside environments using relative paths. For absolute paths, all files of a project are mounted to the /project-dir directory.

Getting started with projects in Orchest#

You can get started with Projects by:

  • Creating a new Project

  • Importing an existing Project

  • Importing Orchest curated or community contributed examples through the Projects page.

Tip

👉 See quickstart tutorial.

Importing a project#

To import an existing Project into Orchest: open the Project dropdown menu and click the import button.

Importing a project in Orchest.

Tip

👉 See video tutorial: importing a project.

Project versioning#

A Project’s .orchest directory should be versioned since it defines the Environments the Project uses. This enables the Project to run on every machine.

The /data directory can be used to store data locally that is accessible by all Pipelines across all Projects, even by Jobs.

Secrets on te other hand, should be set with environment variables to avoid them being versioned.

Using git inside Orchest projects#

Tip

👉 See video tutorial: versioning using git in Orchest.

You can use git inside Orchest with the pre-installed jupyterlab-git extension. Get started by adding your user.name and user.email in configure JupyterLab. For example:

git config --global user.name "John Doe"
git config --global user.email "john@example.org"

Use the following command to add a private SSH key to your terminal session in JupyterLab:

echo "chmod 400 /data/id_rsa" >> ~/.bashrc
echo "ssh-add /data/id_rsa 2>/dev/null" >> ~/.bashrc
echo "if [ -z \$SSH_AGENT_PID ]; then exec ssh-agent bash -c 'shellspawner; bash'; fi" >> ~/.bashrc
mkdir -p ~/.ssh
printf "%s\n" "Host github.com" " IdentityFile /data/id_rsa" >> ~/.ssh/config
ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts

Ensure the id_rsa private key file is uploaded through the pipeline file manager in the data/ folder.

Warning

🚨 Adding a private key file to the /data folder exposes it to everyone using your Orchest instance.

You can then version using git using:

  • JupyterLab terminal.

  • JupyterLab git extension UI.