Installation#

This page contains the installation instructions for self-hosting the open source version of Orchest.

The easiest way to use Orchest is through our free hosted version. Get up and running with a fully configured Orchest instance in less than 2 minutes.

Note

Orchest is in beta.

Prerequisites#

To install Orchest you will need a running Kubernetes (k8s) cluster. You can either pick a managed service by one of the certified cloud platforms or create a cluster yourself. For single node deployments, we recommend using at (the very) least 2 CPU and 8GB of RAM (see CPU contention). Do note that only the following container runtimes are supported:

Pick your deployment environment and Kubernetes distribution and follow the installation steps below. In case you have custom requirements, be sure to first check out the custom requirements section.

Installing Orchest#

We recommend installing Orchest on a clean cluster to prevent clashes with existing cluster-level resources, even though installing Orchest on an existing cluster is fully supported. At this time we only support running Orchest on Linux (x86_64). Either through minikube on a Linux bare-metal/VM or on a Kubernetes cluster.

Managing your Orchest installation#

Your Orchest installation can be fully managed through the orchest-cli, check out the available commands in the Orchest CLI reference.

Note

Your Kubernetes cluster has to be up in order for the orchest-cli to be able to interact with it.

Self hosting a multi node deployment#

If you want to self-host and manage a multi-node Orchest installation, there are a few things to consider, mostly pertaining to storage. Most likely, you’ll need to get accustomed to the Orchest CRD to be aware of what can be customized, e.g. backing a PVC with an EFS.

The docker registry#

The Orchest deployment includes a docker registry to store images of built environments. This single write storage will get larger as you add more environments to your project, so you’ll need to choose a storage option that can handle the increased size or that can be resized later.

The user directory#

Orchest stores most of its data in the user directory, which is used by various services, pipeline steps, and other internal processes. In a multi-node setup, the userdir PVC needs to be backed by storage that can handle multiple writers across nodes, such as NFS or EFS. Alternatively, you could use a distributed file system and an host path for the volume.

Control plane and worker nodes#

A multi node cluster allows to separate what we could consider the Orchest control plane from the work to be done, like running user pipelines. This can improve stability, performance and costs by making sure the two take place in different nodes of the cluster. The orchest-cli install command has some hidden flags that allow you to do this based on node labels. This isn’t exactly a feature we consider published but more like an internal function that we mention here to ease the life of users with more advanced requirements. You’ll have to take a look at the orchest-cli code in the Orchest repo, overall, this particular feature can be considered stable.

When the control plane and worker nodes are separated, the userdir is split into an orchest state PVC and the userdir. If the control plane is only on a single node, the orchest state PVC can be a simple volume.

Custom requirements#

If you have custom requirements (or preferences) for deploying Orchest on your Kubernetes cluster, then one of the following subsections might be helpful:

Setting up an FQDN#

If you would rather reach Orchest using a Fully Qualified Domain Name (FQDN), e.g. by simply going to http://localorchest.io in your browser, instead of using the cluster’s IP directly, you can install Orchest using:

orchest install --fqdn="localorchest.io"

For local Kubernetes clusters such as minikube, you can now make Orchest reachable through the FQDN by:

# Set up the default Fully Qualified Domain Name (FQDN) in your
# /etc/hosts so that you can reach Orchest locally.
echo "$(minikube ip)\tlocalorchest.io" | sudo tee -a /etc/hosts
# Set up the default Fully Qualified Domain Name (FQDN) in your
# /etc/hosts so that you can reach Orchest locally.
echo "127.0.0.1\tlocalorchest.io" | sudo tee -a /etc/hosts

And don’t forget to also run sudo minikube tunnel.

Installing Orchest without Argo Workflows#

If you already have Argo Workflows installed globally (i.e. not namespaced) on your Kubernetes cluster, then you need to explicitly tell Orchest not to install it again:

orchest install --no-argo

Since Argo Workflows creates cluster level resources, installing it again would lead to clashes or both Argo Workflow deployments managing Custom Resource Objects (most likely you don’t want either of those things to happen).

Now that you are using an Argo Workflows set-up that is not managed by the Orchest Controller, you need to make sure that the right set of permissions are configured for Orchest to work as expected. Check out the permissions that the Orchest Controller sets for Argo here. In addition, Orchest makes use of Argo’s Container Set in a single-node setting (i.e. you have singleNode: true in the OrchestCluster CR Object) which requires the use of the Emissary Executor.

Installing Orchest without Nginx Ingress Controller#

If you already have nginx ingress controller deployed on your Kubernetes cluster, then you need to tell Orchest not to install it again:

orchest install --no-nginx

Note

Installation of the Nginx Ingress Controller requires different procedures on EKS and GKE clusters.

Installing Orchest using kubectl#

The code snippet below will install Orchest in the orchest namespace. In case you want to install in another namespace you can use tools like yq to change the specified namespace in orchest-controller.yaml and example-orchestcluster.yaml.

# Get the latest available Orchest version at https://github.com/orchest/orchest/releases
# Example:
export VERSION="v2023.01.8"

# Create the namespace to install Orchest in
kubectl create ns orchest

# Deploy the Orchest Operator
kubectl apply \
  -f "https://github.com/orchest/orchest/releases/download/${VERSION}/orchest-controller.yaml"

# Apply an OrchestCluster Custom Resource
# NOTE: You can also first download the example manifest so that you
# can tweak it to your liking. For example, preventing Orchest from
# also deploying the Nginx controller (because you have already
# configured ingress on your cluster) through the
# `controller.orchest.io/deploy-ingress` annotation.
kubectl apply \
  -f "https://github.com/orchest/orchest/releases/download/${VERSION}/example-orchestcluster.yaml"

In case you want to configure the Orchest Cluster, you can patch the created OrchestCluster.

Setting up a reverse proxy#

When installing Orchest in remote machines, such as AWS EC2 instances, you will need to set up a reverse proxy that redirects traffic to the application appropriately. Here is an example on how to do it on an Ubuntu-based EC2 machine using nginx:

sudo apt-get install -y nginx

# Make Orchest accessible on the instance through localorchest.io
minikube ip | xargs printf "%s localorchest.io" | sudo tee -a  /etc/hosts

# Set up a reverse proxy that listens on port 80 of the host
# and routes traffic to Orchest
sudo cat << EOF > /etc/nginx/sites-available/localorchest.io
map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    server_name orchest;

    location / {
        proxy_pass http://localorchest.io;

        # For project or file manager uploads.
        client_max_body_size 0;

        # WebSocket support.
        proxy_http_version 1.1;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 86400;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/localorchest.io /etc/nginx/sites-enabled/
# Remove default_server.
sudo truncate -s 0 /etc/nginx/sites-available/default
sudo service nginx restart

Scarse (CPU) resources - tweak DNS settings#

This section applies mostly to single-node deployments as otherwise you can configure your Kubernetes cluster to scale with respect to the current load or separate your control plane nodes from your worker nodes.

During times of CPU resource contention, the CoreDNS pod could start failing its readinessProbe leading to kube-proxy updating iptables rules to stop routing traffic to the pod (k8s docs), for which it uses the REJECT target. This means that DNS queries will start failing immediately without the configured resolver timeout being respected (in Orchest we use a timeout of 10 seconds with 5 attempts). In order to respect the timeout instead of failing immediately, you can tweak the readinessProbe or simply remove it by editing the manifest of the coredns deployment:

kubectl edit -n kube-system deploy coredns

Note

👀 For Minikube users we automatically take care of this. Even the warning below doesn’t apply.

Warning

Configuration changes of CoreDNS will be lost when executing kubeadm upgrade apply – see Kubernetes docs. Thus you will have to reapply your changes whenever you run kubeadm upgrade apply.

Why? Well, the CoreDNS manifests are hardcoded in kubeadm, thus if kubeadm init phase addon coredns is ever invoked, then your changes to the configuration of CoreDNS are lost.

What can I do about it? If you are using kubeadm directly, then you could skip the kubeadm addon phase and deploy the respective addons yourself. Or you could just reapply your CoreDNS manifest changes each time.

Closing notes#

Authentication is disabled by default after installation. Check out the Orchest settings to learn how to enable it.