The Civis Platform allows you to chain tasks together to create a Workflow. This allows you to run multiple tasks in sequence to create a unified task structure and to create an end-to-end data analytics pipeline.
This page serves as an introduction to Platform Workflow concepts, and walks through how to build, execute, and schedule a workflow through the UI.
For more advanced topics, see Advanced Workflows, our workflows-public repository with examples of YAML code, and our API documentation.
Getting Started
Terms to know
- Workflow: a collection of related tasks intended to be run with specific dependencies on each other.
- Execution: a single run of a workflow.
- Task: a unit of work inside of a workflow.
- Action: the job that will be run at a particular moment during a workflow. Actions can refer to an existing Civis Job Id, contain the parameters required to create a new job, or run system tasks.
- YAML: the markup coding language used to generate the workflow. Users can create the workflow in YAML if you wish, or you can use the graphical interface.
- Node: the representation of a task in the graph view
- Link: the representation of a connection between tasks in the graph view
- On Success: The following task will run only if the previous task successfully completed.
- On Complete: The following task will run when the previous task completes, regardless of state
- On Error: The following task will run only if the previous task fails to complete successfully.
- Requires: The current task will run only once the following task completes (reverse workflow only)
- Parent: The tasks connected above the current task that cause the task to run
- Child: The tasks connected below the current task that will be run after the current task
- Root: A task with no parent tasks.
Mistral
Civis Workflows are built on the open source Mistral workflow engine (if you don't know Mistral, not to fear! Workflows also has a graphical user interface where you can easily chain jobs together).
Civis supports nearly all of the Mistral YAML-based workflow language, with a few exceptions:
- Civis Workflows, like many workflow engines, do not support cycles (e.g., Task 1 -> Task 2 -> Task 1)
- Only Civis actions are supported. See Advanced Workflows for a full list of supported actions.
- Each Civis workflow can only run a single mistral workflow. This means that tasks may only use the action keyword, and not the workflow keyword
- Support for reverse workflows is currently under development
- While Mistral supports the use of both YAQL and Jinja2 to reference context variables, Civis only supports YAQL.
Visit this GitHub repo for more information on YAML and example code configurations.
Creating a new workflow via the UI
- Go to Tools
- Select Workflow from the drop-down options. The new workflow will be populated with a template that can be edited from the graph view or the YAML pane.
When the workflow is initially created, it will have two empty tasks. The left hand side of the workflow will display the graph visually. The right hand side includes 3 tabs: Info, YAML, and Parameters. This document will cover the Info tab, see Advanced Workflows and Parameters documentation for more information on the other tabs.
Graph view
The graph view is the visual representation of the workflow, and contains several components.
Node (Box): Each node is a visual representation of a task within your workflow, labeled by the task’s name. If a task is selected, it will be highlighted in blue.
Link (Line): Each line shows how the tasks connect. In this simple example, task 1 will run first, then task 2. The tasks run vertically from top to bottom.
On Complete and On Success tasks are connected with a solid line. On Error tasks are connected with a dotted line.
Building Your Workflow
In graph view, you can add new jobs to your workflow by clicking the + sign in the upper right of the graph. If you have a node selected, you will need to select the type of connection (on success, on complete or on error), then the new task node will be automatically created and linked below the selected node. If not, a new node will be created at the top.
You can copy a node using the copy button at the upper right (below the plus). The copy button will only be available if a node is selected. When used, the selected node and underlying YAML will be copied as a new root node, and will be named “copy_of_X”.
Note that only the node itself is copied; parent/child tasks and any connections are not copied.
You can delete a node using the delete button (trash can).The delete button will only be available if a node is selected. When used, the selected node and underlying YAML will be removed, along with relevant connections.
Deletion will not cause new connections to be formed or merged, and will not delete child tasks. For example, if task 1 -> task 2 -> task 3, and task 2 is deleted, tasks 1 and 3 will both be treated as root tasks after deletion.
The workflow can be edited from the graph by moving the links.
In this example, we have task_2 as a root node, and want to connect it as a parent to task_3. To do this, you can mouse over the bottom of task_2 until two blue dots appear, then click and drag the new link to the task you wish to connect to, and let go.
You can delete unwanted links by clicking on the x icon in the center of the link.
Nodes themselves cannot be moved, but the graph will reorient as necessary to display the new workflow.
Editing the graph view updates the code within the YAML pane, which you can also edit directly. (See the Advanced Workflows or YAML documentation for more information.)
The graph view will let you edit the nodes and relationships, but you will also need to add the appropriate job, task name, and other key details using the Info tab. When a task is selected, the task name will be displayed, and can be edited directly. Task names can’t include spaces, but can be named differently than the job name.
The remainder of the tab will depend on the action selected. By default, this will be ‘civis.run_job’, the simplest option. This will allow you to run a single Civis job. You can search for an existing job by typing your job name (or id) into the text box.
To view the underlying job from the info tab, click the eye icon to the right of the job name box.
Other supported action options include container and custom scripts. Please see the Advanced Workflows documentation for more information.
Regardless of which action is selected, the last section will be on task triggers, which control how the tasks interact and the order the workflow tasks are run.
Clicking any of the dropdown buttons will provide a multi-select list of options that can be used to add or remove tasks from the trigger list.
Note that this *only* affects tasks below the selected task. For example, if I had task 1 > task 2, I could remove the child/parent relationship from task 1 only.
For the info tab and other tab views, you can also drag the center bar to increase or decrease the size of the tab to see more information.
Executing Your Workflow
When your workflow is complete and populated with the jobs you want, you can run the workflow by hitting the “Execute All” button at the top right of the page. This will run the jobs in the sequence defined by the workflow.
Partial Executions
Partial Workflows can also be run. To do so, select one or more tasks from the graph view by first selecting a task. With a task selected, mousing over additional tasks will display checkboxes. Click the checkboxes of the tasks you also want to include. Then, click the down arrow next to Execute All to see the execution options. “Execute Selected Task(s)” will run one or more tasks that have been selected, and only those tasks. “Execute from Selected Task will run the workflow from the point you select, and continue as normally from there.
Any task that is not included in a partial Workflow execution will have its action type replaced by std.noop. The original task will not execute, but the task will be treated as successful by the execution. This means that if a selected task depends on the outputs of excluded tasks, it may not behave as expected.
Once the workflow has begun executing, the view will change from the default Build View to the Execution View.
During the execution, the status will show that the execution is running with a spinner icon.
When the execution is complete, the graph view will show the status of the workflow as a whole, the status of each individual task, and the duration it took to run the workflow.
Retry Failed Tasks
If any tasks did not successfully complete, the “Retry Failed Tasks” button will become active in the top right corner to allow the user to try the failed tasks again. Note that this is only useful if you expect the task to succeed again without changing the workflow definition. For example, a job in your workflow may have referred to a database schema that doesn't exist yet. You could manually create the schema and then retry your workflow. Note that you may also make changes to a job referred to by task, and these will be reflected in the retry.
Understanding your Execution
When a task is selected, the info tab will show critical details on the execution, including outputs, if any, and logs.
The task name is also a link that will take the user to the underlying job object that was used for the task.
If no task is selected, the tab will be the same information as the build page.
You can view previous executions using the History View. The History View for a workflow can be accessed by clicking “History” next to the execute button. The icon next to the button indicates the status of the most recent execution.
Within the history view, ID, created time/date, duration, and state are displayed. The ID will act as a link to take the user back to the execution page for that specific run.
You can view the error message for a failed task by clicking on the task that failed and then clicking on the ‘Info’ carrot located at the top left side of the graph view, just below ‘Duration’.
Here is an example of how an error message will appear when clicking on the ‘Info’ carrot:
Scheduling a workflow
To schedule your workflow to run automatically, first, click the three dots at the top right to open the menu. Then select the Automate button to open the automation pane.
In the automation pane, select how often you would like the workflow to execute.
NOTE: If a previous execution of a workflow is still running when the next scheduled run begins, the new run will fail. The entire workflow must finish executing before a new run can start.
You may want to only allow one workflow execution at a time to ensure more recent executions don't catch up to a previously running execution and fail.
Within the Automation pane, you can use the “Concurrent Executions” toggle to force one execution at a time. By default, concurrent executions are allowed.
Comments
0 comments
Please sign in to leave a comment.