dbt Scripts

dbt is a popular tool for building data transformation pipelines in SQL. Civis Platform enables you to run dbt pipelines via dbt Scripts, either by themselves or as part of larger Workflows.

Configuring a dbt Script

If you want to run dbt by itself, you can create a dbt Script.

Step 1: Run Arguments

To run dbt, first connect your script to your Git repository and select which dbt command you’d like to run. If you’d like to pass in any additional arguments, like node selection, you can add these to the Command Line Arguments field. There is usually no need to change the dbt version in use.

Civis Platform currently supports the following dbt commands: build, run, test, compile, and retry. Please let us know by reaching out to Support@CivisAnalytics.com if there's a command you’d like to use which we don’t support yet!

Step 2: Profiles

Choose whether you’d like to use a profiles.yml file from your git repository or if you’d like Civis Platform to autogenerate one for you.

Step 3: Database Variables

Select which database and credential to use and the schema where dbt will create the output tables. This must be a schema you have create permission on.

Step 4: dbt Variables

These variables tell Civis Platform where to look for your dbt configuration.

If your dbt_project.yml file lives in a sub-directory of your repo, specify that directory path in the “dbt Project Directory” field.

When not auto-generating a profile, you may also set the Profiles Directory and Target for dbt to use. You do not need to set a profiles directory if your profiles.yml lives in the root of your repository or in the ~/.dbt folder. If target is unspecified, it will default to the default target defined in your profiles.yml.

Step 5: Parameters (Optional)

Like in any Civis Platform Script, you may add additional parameters for use in your dbt script using the “Set Parameters” side pane.

You may reference these parameters as environment variables in your dbt project:

"{{ env_var('MY_PARAM') }}"

dbt & Workflows

In this section, we’ll show an example Workflow based on dbt’s jaffle_shop example. If you want to try this out and see dbt running in Civis Platform, you can 1) create a new Workflow object in Civis Platform, 2) copy-paste the example Workflow definition below into the YAML tab, and 3) start an execution after specifying IDs for your database and database credentials.

In the example Workflow below, we have just a single task that runs dbt, for simplicity. A more realistic Workflow might have tasks for imports or exports, tasks for data processing in Python, tasks to publish to a Report, etc.

The script uses the generate_profiles setting to automatically create a profiles.yml file that tells dbt how to connect to your database and what schema the output tables should be created in. For production pipelines, you may want to use a profiles.yml file stored in your GitHub repository instead.

The script runs the dbt build command to seed the database, build tables, and run tests.

version: '2.0'

workflow:
  input:
    # Find the ID of your database here: https://api.civisanalytics.com/databases
   - database_id:
   # Find your default database credential ID here: https://platform.civisanalytics.com/spa/#/credentials
   - database_cred_id:
   # Desired output schema. Must have create permission on it.
   - output_schema: scratch
  
  tasks:
    run_dbt:
      action: civis.scripts.dbt
      input:
        name: Run dbt
        repo_http_uri: github.com/civisanalytics/platform-code-examples
        repo_ref: main
        target_database:
          remote_host_id: <% int($.database_id) %>
          credential_id: <% int($.database_cred_id) %>
        dbt_project:
          schema: <% $.output_schema %>
          project_dir: dbt
          generate_profiles: true
          dbt_command: "build"

dbt Docs

A useful feature of dbt is that it can produce interactive documentation to help you understand your data pipeline. Civis Platform automatically generates this documentation for you and uploads it to a report for easy viewing and sharing.

To view this documentation, click the “dbt Docs Report” button next to the run History. This will open the dbt docs report in a new tab.

Constructing a Profile

If you’d like to use a custom profiles.yml file, you can still take advantage of Civis Platform’s database and credential support.

Example profiles.yml

Here is an example profiles.yml file showing how to connect to each different database type. You are unlikely to need both targets; choose whichever is applicable to you.

my-project-name:
  target: civis_redshift # sets the default target
  outputs:
    # Works for redshift or postgres
    civis_redshift:
      type: "{{ env_var('DATABASE_TYPE') }}"
      host: "{{ env_var('DATABASE_HOST') }}"
      port: "{{ env_var('DATABASE_PORT') | as_number }}"
      dbname: "{{ env_var('DATABASE_NAME') }}"
      user: "{{ env_var('DATABASE_USERNAME') }}"
      password: "{{ env_var('DBT_ENV_SECRET_DATABASE_PASSWORD') }}"
      schema: "{{ env_var('DBT_SCHEMA') }}"
      threads: 4
    civis_bigquery_service_token:
      type: "{{ env_var('DATABASE_TYPE') }}"
      method: "{{ env_var('GCP_AUTH_METHOD') }}"
      project: "{{ env_var('GCP_PROJECT_ID') }}"
      schema: "{{ env_var('DBT_SCHEMA') }}"
      keyfile: "{{ env_var('GCP_SERVICE_ACCOUNT_KEYFILE_JSON') }}"
      threads: 4

dbt recommends that you save your profiles.yml configuration file either to the same directory as your dbt_project.yml file or to a ~/.dbt folder. To read more about dbt profiles, see here.

Note: dbt expects the port parameter to be an integer, so the environment variable needs to be piped through the as_number jinja function. For more information about jinja functions in dbt, see dbt jinja functions.

Custom Parameters

You can also define custom parameters for any values which Civis Platform isn’t providing and reference them as environment variables in your dbt project (e.g., with {{ env_var("MY_PARAM") }}). For example, if you want to parameterize the number of threads in use, you could define a THREADS custom parameter and reference it in your profiles file. Use the "Set Parameters" pane to set custom parameters.

In a Workflow, you can specify parameters and arguments for a dbt job similar to how you would do so for a Python, R, or Container Script. Here’s a simplified example:

version: '2.0'

workflow:
  input:
    - my_variable: foo
  tasks:
    run_dbt:
      action: civis.scripts.dbt
      input:
        # Other settings like name and dbt_project go here...
        repo_http_uri: github.com/civisanalytics/platform-code-examples
        params:       
          - name: my_variable
            type: string
        arguments:
          my_variable: <% $.my_variable %>

See a more detailed example here.

Retries

To promote the stability of your dbt scripts, Platform’s auto-generated profiles.yml file defaults to retrying connections 3 times. If you’d like to customize this value, you can do so by setting a DBT_RETRIES integer environment variable using the parameters pane.

If you are using a custom profiles.yml file, you can customize the retries value directly there. See dbt’s Redshift and BigQuery connection docs.

Troubleshooting

Memory issues

dbt scripts default to 1GB of memory to run in. If your logs include the message Killed, this may be because the dbt process attempted to use too much memory. In a dbt run, this is often caused by a macro, such as run_query that is attempting to process too many rows. In general, it is more performant in both time and memory to refactor the affected model to process fewer rows locally, potentially by using a LIMIT or GROUP clause. If optimizing your models isn't an immediate option, you can also adjust the script memory setting under the Settings dropdown or via the API.

Table and Comment Descriptions

dbt has functionality to automatically add table and column descriptions to your database (e.g., using Redshift’s COMMENT command), and these descriptions will show up in Civis Platform’s Data Pane and Table Details page. See this dbt documentation page for how to use this functionality in your dbt_project.yml or dbt model files.

dbt Resources

Official dbt documentation

dbt Style Guide

Refactoring legacy SQL to dbt SQL

Articles in this section

Configuring a dbt Script

Step 1: Run Arguments

Step 2: Profiles

Step 3: Database Variables

Step 4: dbt Variables

Step 5: Parameters (Optional)

dbt & Workflows

dbt Docs

Constructing a Profile

Example profiles.yml

Custom Parameters

Retries

Troubleshooting

Table and Comment Descriptions

dbt Resources

Comments

Articles in this section

Configuring a dbt Script

Step 1: Run Arguments

Step 2: Profiles

Step 3: Database Variables

Step 4: dbt Variables

Step 5: Parameters (Optional)

dbt & Workflows

dbt Docs

Constructing a Profile

Example profiles.yml

Custom Parameters

Retries

Troubleshooting

Table and Comment Descriptions

dbt Resources

Related articles