Dispatching Jobs for Research Workflows
Saturn Cloud Jobs are a set of code that can be set to run in one of four ways:
- By pressing the “start button” within Saturn Cloud
- By running on a preset schedule
- Via an HTTP POST request to Saturn Cloud for programmatic running
- Triggering it via the Saturn Cloud cli (which sends the same HTTP POST mentioned above)
Most people think about jobs in the context of productionizing data science - for example ETL jobs, or model re-training jobs. However jobs are also useful in interactive research.
- Users may opt to do research and development on smaller cheaper machines, and then from there dispatch jobs to more powerful ones.
- Users may opt to dispatch a job, and then shut down their development machine while the job runs over night, or over the weekend.
- Users may opt to dispatch parallel jobs, in order to run parameter scans, or run many experiments and simulations.
This article discusses workflows around dispatching jobs in support of interactive research.
Note:
This article makes use of the saturn cloud cli, which you can install withpip install saturn-client
Create the new job
Saturn Cloud recipes are the recommended approach for working with Jobs for research. A basic job recipe looks like this:
type: job
spec:
name: hello-world
description: ''
image: community/saturncloud/saturn-python:2023.09.01
instance_type: large
environment_variables: {}
working_directory: /home/jovyan/workspace
start_script: ''
git_repositories: []
secrets: []
shared_folders: []
start_dind: false
command: echo "hello world"
scale: 1
use_spot_instance: false
schedule: null
This recipe dispatches a job that executes echo "hello world"
using the saturn-python:2023.09.01
image (The image is community/saturncloud/saturn-python:2023.09.01
because we are on the community instance of Saturn Cloud.
Save the above to recipe.yaml
and then you can submit the job as follows:
$ sc apply recipe.yaml --start
This will create the Job in Saturn Cloud, and start it. After this you can view the job and the logs from the Saturn Cloud UI, but you can also work with it from the command line (more on this later).
Note:
In most cases, you will not be writing recipes from scratch - though of course you always can.Creating jobs by cloning other resources
If you’re a Saturn Cloud user it’s generally easier to clone an existing workspace rather than creating a Job recipe from scratch. Usually you already have a workspace where you can run your job code interactively. We always recommend people start running jobs interactively before trying to deploy them.
You can clone a workspace as a job with the following command:
$ sc clone workspace ops-devel job my-job --command "echo 'hello-world'"
It is often useful to write that recipe to a file, so that you can modify it:
$ sc get job my-job > /tmp/recipe.yaml
Afterwards, you can submit the job via:
$ sc start job my-job
If you have modified the recipe and you would like to apply it:
$ sc apply /tmp/recipe.yaml --start
Source code and data used by jobs
Most Saturn Cloud resources get their code from Git repositories you have configured in Saturn Cloud.. Most Saturn Cloud resources load data from networked resources such as S3, shared folders or databases like Snowflake and Redshift.
For research, it can be convenient to be able to synchronize files from your development environment to the job. This may be because you have code changes that you aren’t ready to push to Git, or because you have data files locally that don’t exist on networked storage.
This workflow supports synchronizing arbitrary files with your job.
$ sc apply /tmp/recipe.yaml --sync /home/jovyan/workspace/my-repo --sync /home/jovyan/my-data
This command will archive /home/jovyan/workspace/my-repo
and /home/jovyan/my-data
into tar.gz
files, and upload them to internally hosted networked storage (SaturnFS), and generate start script commands in your job that will download and extract the files to the appropriate location. For example after applying the above command, the resulting recipe includes this additional block:
spec:
start_script: >
### BEGIN SATURN_CLIENT GENERATED CODE
saturnfs cp
sfs://internal/hugo/ops-devel-run/home/jovyan/workspace/my-repo/data.tar.gz
/tmp/data.tar.gz
mkdir -p /home/jovyan/workspace/my-repo
tar -xvzf /tmp/data.tar.gz -C /home/jovyan/workspace/my-repo
saturnfs cp
sfs://internal/hugo/ops-devel-run/home/jovyan/my-data/data.tar.gz
/tmp/data.tar.gz
mkdir -p /home/jovyan/my-data/
tar -xvzf /tmp/data.tar.gz -C /home/jovyan/my-data/
### END SATURN_CLIENT GENERATED CODE
Job output
Saturn Cloud jobs are dispatched to new machines, and these machines are automatically torn down after the job completes. The only output Saturn Cloud captures are job logs. All other output files your job produces should be saved to a network location.
Job status and logs
The following command tells me the current status for the job.
$ sc list job hello-world
owner name resource_type status instance_type scale id
----------------------------------------------------------------------------------------------------------------------
internal/hugo hello-world job pending large 1 a387e542a27a4e689f16a0fac48901de
The following command gives me all invocations (pods) for this job.
$ sc pods job hello-world
pod_name status source start_time end_time
----------------------------------------------------------------------------------------------------------------------------------------------------
id-hugo-hello-world-a387e542a27a4e689f16a0fac48901de-nz-0-hplcq pending live 2024-04-01T15:29:36+00:00
id-hugo-hello-world-a387e542a27a4e689f16a0fac48901de-wc-0-4ptfv completed historical 2024-04-01T01:29:46+00:00 2024-04-01T01:32:31+00:00
id-hugo-hello-world-a387e542a27a4e689f16a0fac48901de-hd-0-w5qx4 completed historical 2024-04-01T01:15:48+00:00 2024-04-01T01:18:34+00:00
You can then request the logs for each pod. Note, Saturn Cloud captures live and historical logs. Live logs are stored on the machine where the job is running. These disappear when the machine is torn down. Historical logs are an archive of the live logs, but there may be a few minute delay before logs end up in the historical log store. As a result, the CLI lets you specify which source you would like to choose for logs. If you omit the source, the client attempts to figure out the best source.
$ sc logs job hello-world id-hugo-hello-world-a387e542a27a4e689f16a0fac48901de-nz-0-hplcq
$ sc logs job hello-world id-hugo-hello-world-a387e542a27a4e689f16a0fac48901de-nz-0-hplcq --source live
$ sc logs job hello-world id-hugo-hello-world-a387e542a27a4e689f16a0fac48901de-nz-0-hplcq --source historical