Continuous integration jobs in dbt Cloud
You can set up continuous integration (CI) jobs to run when someone opens a new pull request (PR) in your dbt Git repository. By running and testing only modified models, dbt Cloud ensures these jobs are as efficient and resource conscientious as possible on your data platform.
Set up CI jobs
dbt Labs recommends that you create your CI job in a dedicated dbt Cloud deployment environment that's connected to a staging database. Having a separate environment dedicated for CI will provide better isolation between your temporary CI schema builds and your production data builds. Additionally, sometimes teams need their CI jobs to be triggered when a PR is made to a branch other than main. If your team maintains a staging branch as part of your release process, having a separate environment will allow you to set a custom branch and, accordingly, the CI job in that dedicated environment will be triggered only when PRs are made to the specified custom branch. To learn more, refer to Get started with CI tests.
Prerequisites
- You have a dbt Cloud account.
- CI features:
- For both the concurrent CI checks and smart cancellation of stale builds features, your dbt Cloud account must be on the Team or Enterprise plan.
- The SQL linting feature is currently available in beta to a limited group of users and is gradually being rolled out. If you're in the beta, the Linting option is available for use.
- Advanced CI features:
- For the compare changes feature, your dbt Cloud account must be on the Enterprise plan and have enabled Advanced CI features. Please ask your dbt Cloud administrator to enable this feature for you. After enablement, the dbt compare option becomes available in the CI job settings.
- Set up a connection with your Git provider. This integration lets dbt Cloud run jobs on your behalf for job triggering.
- If you're using a native GitLab integration, you need a paid or self-hosted account that includes support for GitLab webhooks and project access tokens. If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab.
To make CI job creation easier, many options on the CI job page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them.
-
On your deployment environment page, click Create job > Continuous integration job to create a new CI job.
-
Options in the Job settings section:
- Job name — Specify the name for this CI job.
- Description — Provide a description about the CI job.
- Environment — By default, this will be set to the environment you created the CI job from. Use the dropdown to change the default setting.
-
Options in the Git trigger section:
- Triggered by pull requests — By default, it’s enabled. Every time a developer opens up a pull request or pushes a commit to an existing pull request, this job will get triggered to run.
- Run on draft pull request — Enable this option if you want to also trigger the job to run every time a developer opens up a draft pull request or pushes a commit to that draft pull request.
- Triggered by pull requests — By default, it’s enabled. Every time a developer opens up a pull request or pushes a commit to an existing pull request, this job will get triggered to run.
-
Options in the Execution settings section:
-
Commands — By default, this includes the
dbt build --select state:modified+
command. This informs dbt Cloud to build only new or changed models and their downstream dependents. Importantly, state comparison can only happen when there is a deferred environment selected to compare state to. Click Add command to add more commands that you want to be invoked when this job runs. -
Lintingbeta — Enable this option for dbt to lint the SQL files in your project as the first step in
dbt run
. If this check runs into an error, dbt can either Stop running on error or Continue running on error. -
dbt compareenterprise — Enable this option to compare the last applied state of the production environment (if one exists) with the latest changes from the pull request, and identify what those differences are. To enable record-level comparison and primary key analysis, you must add a primary key constraint or uniqueness test. Otherwise, you'll receive a "Primary key missing" error message in dbt Cloud.
To review the comparison report, navigate to the Compare tab in the job run's details. A summary of the report is also available from the pull request in your Git provider (see the CI report example).
-
Compare changes against an environment (Deferral) — By default, it’s set to the Production environment if you created one. This option allows dbt Cloud to check the state of the code in the PR against the code running in the deferred environment, so as to only check the modified code, instead of building the full table or the entire DAG.
infoOlder versions of dbt Cloud only allow you to defer to a specific job instead of an environment. Deferral to a job compares state against the project code that was run in the deferred job's last successful run. Deferral to an environment is more efficient as dbt Cloud will compare against the project representation (which is stored in the
manifest.json
) of the last successful deploy job run that executed in the deferred environment. By considering all deploy jobs that run in the deferred environment, dbt Cloud will get a more accurate, latest project representation state. -
Run timeout — Cancel the CI job if the run time exceeds the timeout value. You can use this option to help ensure that a CI check doesn't consume too much of your warehouse resources. If you enable the dbt compare option, the timeout value defaults to
3600
(one hour) to prevent long-running comparisons.
-
-
(optional) Options in the Advanced settings section:
- Environment variables — Define environment variables to customize the behavior of your project when this CI job runs. You can specify that a CI job is running in a Staging or CI environment by setting an environment variable and modifying your project code to behave differently, depending on the context. It's common for teams to process only a subset of data for CI runs, using environment variables to branch logic in their dbt project code.
- Target name — Define the target name. Similar to Environment Variables, this option lets you customize the behavior of the project. You can use this option to specify that a CI job is running in a Staging or CI environment by setting the target name and modifying your project code to behave differently, depending on the context.
- dbt version — By default, it’s set to inherit the dbt version from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior.
- Threads — By default, it’s set to 4 threads. Increase the thread count to increase model execution concurrency.
- Generate docs on run — Enable this if you want to generate project docs when this job runs. This is disabled by default since testing doc generation on every CI check is not a recommended practice.
- Run source freshness — Enable this option to invoke the
dbt source freshness
command before running this CI job. Refer to Source freshness for more details.
Example of CI check in pull request
The following is an example of a CI check in a GitHub pull request. The green checkmark means the dbt build and tests were successful. Clicking on the dbt Cloud section takes you to the relevant CI run in dbt Cloud.
Example of CI report in pull request preview
The following is an example of a CI report in a GitHub pull request, which is shown when the dbt compare option is enabled for the CI job. It displays a high-level summary of the models that changed from the pull request.
Trigger a CI job with the API
If you're not using dbt Cloud’s native Git integration with GitHub, GitLab, or Azure DevOps, you can use the Administrative API to trigger a CI job to run. However, dbt Cloud will not automatically delete the temporary schema for you. This is because automatic deletion relies on incoming webhooks from Git providers, which is only available through the native integrations.
Prerequisites
- You have a dbt Cloud account.
- For the Concurrent CI checks and Smart cancellation of stale builds features, your dbt Cloud account must be on the Team or Enterprise plan.
- Set up a CI job with the Create Job API endpoint using
"job_type": ci
or from the dbt Cloud UI. - Call the Trigger Job Run API endpoint to trigger the CI job. You must include both of these fields to the payload:
-
Provide the pull request (PR) ID using one of these fields:
github_pull_request_id
gitlab_merge_request_id
azure_devops_pull_request_id
non_native_pull_request_id
(for example, BitBucket)
-
Provide the
git_sha
orgit_branch
to target the correct commit or branch to run the job against.
-
Semantic validations in CI teamenterprise
Automatically test your semantic nodes (metrics, semantic models, and saved queries) during code reviews by adding warehouse validation checks in your CI job, guaranteeing that any code changes made to dbt models don't break these metrics.
To do this, add the command dbt sl validate --select state:modified+
in the CI job. This ensures the validation of modified semantic nodes and their downstream dependencies.
Benefits
- Testing semantic nodes in a CI job supports deferral and selection of semantic nodes.
- It allows you to catch issues early in the development process and deliver high-quality data to your end users.
- Semantic validation executes an explain query in the data warehouse for semantic nodes to ensure the generated SQL will execute.
- For semantic nodes and models that aren't downstream of modified models, dbt Cloud defers to the production models
Set up semantic validations in your CI job
To learn how to set this up, refer to the following steps:
- Navigate to the Job setting page and click Edit.
- Add the
dbt sl validate --select state:modified+
command under Commands in the Execution settings section. The command uses state selection and deferral to run validation on any semantic nodes downstream of model changes. To reduce job times, we recommend only running CI on modified semantic models. - Click Save to save your changes.
There are additional commands and use cases described in the next section, such as validating all semantic nodes, validating specific semantic nodes, and so on.
Use cases
Use or combine different selectors or commands to validate semantic nodes in your CI job. Semantic validations in CI supports the following use cases: