Troubleshooting =============== When running against issues it can be helpful to increase the verbosity of Orchest and changing the log level of all Orchest's containers. You can do this using: .. code-block:: sh orchest patch --log-level=DEBUG Some other kubectl commands that can be useful when debugging Orchest: .. code-block:: sh # Inspect the logs of a particular service kubectl logs -n orchest -f deployment/orchest-api # Attach a shell in a particular service kubectl exec -n orchest -it deployment/orchest-api bash Minikube is unresponsive on Docker Desktop ------------------------------------------ Sometimes Minikube becomes unresponsive on Docker Desktop. Some tell-tale signs of this are: - You're getting exit code ``137`` when building Orchest images (``scripts/build_containers.sh``) - k9s fails to connect to minikube's context and ``docker logs minikube`` hangs, even though minikube is started - Minikube commands hang or are really slow This happens because too little memory is being allocated to Docker Engine. To fix it: open Docker Desktop and go to `Settings > Advanced` and then increase the "Memory" slider (`See this GitHub issue for reference `_). Inspecting the ``orchest-api`` API schema ----------------------------------------- To develop against the API it can be useful to have a look at the swagger documentation. This can be done by portforwarding the ``orchest-api`` and visiting the `/api` endpoint. .. code-block:: sh # You will be able to visit `localhost:8000/api` kubectl port-forward -n orchest deployment/orchest-api 8000:80 Inspecting the ``orchest-database`` ---------------------------------------------- .. code-block:: sh kubectl port-forward -n orchest deployment/orchest-database 5432:5432 # You could accomplish the same by ``exec``ing into the database pod, # this can be much more handy since commands history will be # preserved through restarts, etc. psql -h 127.0.0.1 -p 5432 -U postgres -d orchest_api psql -h 127.0.0.1 -p 5432 -U postgres -d orchest_webserver psql -h 127.0.0.1 -p 5432 -U postgres -d auth_server Breaking schema changes ----------------------- **What it looks like** The client can't be accessed (the webserver is not up) or the client can be accessed but a lot of functionality seems to not be working, e.g. creating an environment. **How to solve** .. code-block:: bash kubectl port-forward -n orchest deployment/orchest-database 5432:5432 psql -h 127.0.0.1 -p 5432 -U postgres # Once in psql, drop the db of interest. drop database orchest_api; # or orchest-webserver, auth-server # Exit psql and restart Orchest bash orchest restart .. note:: An alternative approach is to reinstall Orchest. ``bash orchest uninstall`` followed by `bash orchest install``. **Context** Some branches might contain a schema migration that applies changes to the database in a way that is not compatible with ``dev`` or any other branch. By moving back to those branches, the database has a schema that is not compatible with what's in the code. **Verify** Check the webserver and the api logs. It will be easy to spot because the service won't produce other logs but the ones related to incompatible schema changes and database issues. Error: Multiple head revisions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **What it looks like** You see an error along the lines of ``Error: Multiple head revisions are present for given argument 'head'`` inside one of the services interacting with the DB, e.g. the ``orchest-api``. **How to solve** Using the ``orchest-api`` as an example here. .. code-block:: bash bash scripts/migration_manager.sh orchest-api merge heads It may be that the above doesn't work, because the ``orchest-api`` never reaches a running state. In that case we need to: .. code-block:: bash # Change the deployment so that it does a sleep instead of invoke # the cmd of the container. kubectl -n orchest edit deploy orchest-api # command: ["sleep"] # args: ["1000"] # Now run the migration script inside the orchest-api container python migration_manager.py db merge heads # Next we need to copy the file out of the container and into # the migration revisions directly inside the orchest-api kubectl cp \ "orchest/${pod_name}:/orchest/services/orchest-api/app/migrations/versions" \ "services/orchest-api/app/migrations/versions" # Rebuild the orchest-api container on the node scripts/build_container.sh -i orchest-api -t "v2022.04.0" -o "v2022.04.0" # Edit the orchest-api deployment again to make sure to not # run the sleep command anymore. kubectl -n orchest edit deploy orchest-api **Context** Alembic creates revision files to do migrations. When two different branches have done schema migrations then the head will diverge, similar to git now having two different branches which point to different commits. Once these branches get merged, the alembic revision heads need to be merged as well. Dev mode not working -------------------- * Make sure you started the cluster with the Orchest repository mounted, see :ref:`here `. * If you have changed some dependencies (i.e. requirements.txt files) you have to rebuild the image and kill the pod to get it redeployed. Can't log-in to authentication enabled instance ----------------------------------------------- Open ``k9s`` and open a shell (``s`` shortcut) on the ``orchest-database`` pod. .. code-block:: bash # Log into the DB psql -U postgres -d orchest_api UPDATE settings SET value = '{"value": false}' WHERE name='AUTH_ENABLED'; Next you need to ``orchest restart`` on your host for the changes to take affect. Or kill the appropriate pods so that they restart. Missing environment variables in pods ------------------------------------- **What it looks like** The pods of the ``orchest-api``, ``orchest-webserver``, ``auth-server`` or ``celery-worker`` are having issues related to missing environment variables. E.g. they can't start because a given environment variable is not defined or is wrongly defined. **Context** Some parts of Orchest, like the ``orchest-controller``, are never stopped, regardless of issuing a ``orchest restart`` or ``orchest stop``. This makes it so that, despite rebuilding all images (i.e. including the controller) on a given branch and restarting Orchest, the new controller image is not actually deployed. This leads to an inconsistency between what's running in the cluster. **How to solve** - make sure you have built the controller image, ``eval $(minikube -p minikube docker-env)`` then ``bash scripts/build_container.sh -i orchest-controller -o $TAG -t $TAG``. - stop Orchest, ``orchest stop``. - cause a redeployment of the controller image by killing the controller pod, ``kubectl delete pod -n orchest -l app.kubernetes.io/name=orchest-controller``, or scale down and back up the controller deployment, or any other preferred solution. - start Orchest, ``orchest start``.