Prefect Cloud Flows
Prefect Cloud is a hosted, high-availability, fault-tolerant service that handles all the orchestration responsibilities for running data pipelines. It gives you complete oversight of your workflows and makes it easy to manage them. It provides cloud convenience with on-prem security. It follows hybrid model: users will design, test and build workflow in form of code that is orchestrated on the cloud but executed on their private infrastructure. In this case the “private infrastructure” are the resources running on Saturn Cloud. Once the workflow is registered with Prefect Cloud, codeless version of workflow is sent to cloud, which provides fully-managed orchestration service to your workflow on Saturn Cloud.
Saturn Cloud connects to Prefect Cloud via a special Prefect Cloud Flow resource type. This resource is started and stopped via Prefect Cloud–unlike other resource types you do not directly interact with it. Instead you manage a Prefect Cloud account which then makes the appropriate Saturn Cloud calls. This allows you to use Prefect Cloud as your centralized data science pipeline management location.
Note that Prefect Cloud is distinct from Prefect Core, which is the open-source Python library that can be run to manage data pipelines. Prefect Core can be run within a single Jupyter Server resource like any other Python library, whereas Prefect Cloud allows you to orchestrate across multiple resources in a cloud hosted pipeline. Generally if you have a complex pipeline that you want to manage it’s better to use the Prefect Cloud service than to try and manage a system within a single Saturn Cloud resource.
Prefect Cloud Components
Below are the different components of Prefect Cloud and how they connect to Saturn Cloud.
Flows
A flow in Prefect Cloud is a container for multiple tasks which understands the relationship between those tasks. When a flow changes, a new “flow version” is created. Example changes include:
- some tasks have been added or removed
- the dependencies between tasks have changed
- the flow is using a different execution mechanism (like a Dask cluster instead of a local Python process)
The Prefect Cloud UI keeps track of all these versions, and knows to link all versions of the same flow into one “flow group”.
Each time a flow is executed, a Prefect Cloud Flow resource in Saturn Cloud is spun up to do the computations. When it is complete the resource is shut down.
Agents
A Prefect Agent is a small always-on service responsible for running flows and reporting their logs and statuses back to Prefect Cloud. Prefect Agents are always “pull-based”–they are configured to periodically make requests to Prefect Cloud to determine if new work is scheduled to run. When the agent receives a request from Prefect Cloud, the agent is responsible for inspecting following details of the flow and then kicking off a flow run.
When using Saturn Cloud, the Prefect Agents will be always running in the Saturn Cloud account. These can be adjusted in the Prefect Agents tab of the Saturn Cloud app.
Saturn Cloud + Prefect Cloud Architecture
Using Saturn Cloud and Prefect Cloud together looks like this:
- Using credentials from your Prefect Cloud account, you create an Agent running in Saturn Cloud.
- You create a Saturn Cloud Jupyter server resource which defines all the dependencies your code needs.
- In a Jupyter server with all those dependencies set up, you write flow code in Python using the
prefect
library - In your Python code, you use the
prefect-saturn
library to “register” your flow with Saturn Cloud, and theprefect
library to register it with Prefect Cloud. Your flow will be automatically labeled to match with Prefect agents running in your Saturn cluster. prefect-saturn
adds the following features to your flow by default:- storage:
Webhook
- run config:
KubernetesRun
- executor:
LocalExecutor
, using aprefect_saturn.SaturnCluster
- labels:
saturn-cloud, webhook-flow-storage, <YOUR_CLUSTER_DOMAIN>
- storage:
- When Prefect Cloud tells your Prefect Agent in Saturn to run the flow, Saturn Cloud creates a kubernetes job to run the flow.
Using this integration, you’ll write code with the prefect
library which talks to Saturn Cloud and Prefect Cloud. Their responsibilities are as follows:
prefect
library- describe the work to be done in a flow
- tell Prefect Cloud about the flow, including when to run it (on a schedule? on demand?)
- store that flow somewhere so it can be retrieved and run later
- Saturn Cloud
- provide a hosted Jupyter Lab experience where you can author and test flows, and a library for easily deploying them (
prefect-saturn
- run an Agent that checks Prefect Cloud for new work
- when Prefect Cloud says “yes run something”, retrieve flows from storage and run them
- automatically start up a flow execution environment (a single node or a distributed Dask cluster) to run your flow, with the following guarantees:
- is the size you asked for
- has a GPU your code can take advantage of (if you requested one)
- has the exact same environment as the Jupyter notebook where you wrote your code
- has all of the code for your project (like other libraries you wrote)
- has all of the credentials and secrets you’ve added (like AWS credentials or SSH keys)
- display logs in the Saturn Cloud UI
- send logs and task statuses back to Prefect Cloud, so you have all the information you need to react if anything breaks
- provide a hosted Jupyter Lab experience where you can author and test flows, and a library for easily deploying them (
- Prefect Cloud
- keep track of all the flows you’ve registered
- when it’s time to run those flows (either on demand or based on a schedule), tell Agents to run them
- display a history of all flow runs, including success / failure of individual tasks and logs from all tasks
- allow you to kick off a flow on-demand using a CLI, Python library, or clicking buttons in the UI
Detailed steps for using Prefect Cloud and Saturn Cloud
Set Up a Prefect Cloud Account
First, create an account with Prefect Cloud:
- Sign up at https://www.prefect.io/cloud/.
- Once logged in, create a project.
- Following the Prefect documentation, create a User API Key and a Service Account API Key. Store these for later.
User API Key: allows a user to register new flows with Prefect Cloud. To generate User API Key, go to Account Settings > API Keys within the Prefect Cloud UI and click “Create an API Key”.
Service Account API Key: must be created by an admin. Allows an agent to communicate with Prefect Cloud. To create service accounts and associated API keys, go to Team > Service Accounts.
Create a Prefect Cloud Agent in Saturn Cloud
Prefect Cloud “agents” are always-on processes that poll Prefect Cloud and ask “want me to run anything? want me to run anything?". In Saturn Cloud, you can create these agents with a few clicks and let Saturn handle the infrastructure.
- Log in to the Saturn UI as an admin user.
- Navigate to the “Secrets” page and add a Prefect Cloud Service Account API Key.
Name
: Choose a Unique identifier for this. Name should be only lowercase letters, numbers, and dashes, such as prefect-runner-token.Value
: the Service Account API Key you created during setup
- Navigate to the “Prefect Agents” page. Create a new agent.
Name
: Each Prefect Agent must have a unique name.Prefect Runner Token
: Select from dropdown, the name you used to set a Unique identifier in secrets page.
- Start Prefect Agent by clicking the play button.
After a few seconds, your agent will be ready! Click on the Agent’s status to see the logs for this agent.
In the Prefect Cloud UI, you should see a new KubernetesAgent
up and running!
Create and Register a Flow
Now that you’ve created an account in Prefect Cloud and set up an agent in Saturn Cloud to run the work there, it’s time to create a flow!
- Return to the Saturn UI.
- Navigate to the “Secrets” page and add a Prefect Cloud User API Key.
Name
: Choose a Unique identifier for this. Name should be only lowercase letters, numbers, and dashes, such asprefect-user-token
.Value
: the User API Key you created during setup.
- Navigate to the “Resources” page and create a new Jupyter Server with the following specs.
Name
: Name of the resource.Image:
Choose image as per your requirements in workflow.Workspace Settings
Hardware
,Disk Space
,Shutoff After
: keep the defaults
Environment Variables
PREFECT_CLOUD_PROJECT_NAME='set this to the name of your project, which you created in Prefect Cloud '
Start script
pip install --upgrade prefect-saturn
- Once the resource is created, start it by clicking the play button.
- Once that server is ready, click “JupyterLab” to launch JupyterLab.
- In JupyterLab, open a new notebook and start working or access your code in git repo folder, if you have added repository to a resource.
- You can see some sample workflows and information on how to register this flow in the Saturn Cloud examples.
Once you’ve registered a flow, it will create a new Saturn Cloud resource specifically for running the flow. If you go to the Resources page of Saturn Cloud you should see a new resource created.
Inspect Flow Runs
Now that your flow has been created and registered with both Saturn Cloud and Prefect Cloud, you can track it’s progress in the Prefect Cloud UI.
- In the Prefect Cloud UI, go to
Flows --> name of your flow
. ClickSchematic
to see the structure of the pipeline.
- Click
Logs
to see logs for this flow run.- From this page, you can search the logs, sort them by level, and download them for further analysis.
- In the Saturn Cloud UI, navigate to “Prefect” resource associated with this work. This will bring you to a table of the prefect flows. Click on the flow’s name in that table. This will take you to the flow’s details page, where you can see a list of flow runs. Click the icon under “logs” in the flow run table to view logs from a flow run.
This view allows you to see some logs that won’t be visible in Prefect Cloud, including any output generated by your resource’s start script.
- In the Saturn Cloud UI, navigate back to the
Prefect Agents
page. Click therunning
status for the agent you previously set up. You should see new logs messages confirming that the agent has received a flow to run.
Clean Up
If you have scheduled your flow to run in set of intervals, and want to clean it up follow the instructions below.
In Prefect Cloud
- navigate to
Flows
. Delete the newly created flow.
In Saturn Cloud
- Logged in as the user who created the flow, navigate to the Prefect resource and delete it as well as the Jupyter server used to create the flows.
- Logged in as the user you used to create a Prefect agent, navigate to the
Prefect Agents
page. Click the delete button to stop and delete the Prefect agent.
Learn and Experiment!
To learn more about prefect-saturn
, see https://github.com/saturncloud/prefect-saturn.
To see examples of creating a workflow and running on Prefect Cloud check out the Saturn Cloud Examples.
Prefect Cloud feature is available for Enterprise users only. If you have any questions about Saturn Enterprise or in general, send us an email at support@saturncloud.io.