Developed by Greg Clunies @ Surfline.
To add this to your repo, copy the contents of sqlfluff_lint_dbt_models.yml
in this folder into a file named .github/workflows/sqlfluff_lint_dbt_models.yml
.
See the dbt_artifacts dbt package for an example of this workflow in action. An example of the annotations can be seen here.
- Lints any added or modified models in
/models
- Uses
conda
to setup a virtual environment and managepython
,dbt
, andsqlfluff
dependencies. An exampleenvironment.yml
can be found in this folder. You can modify this workflow to handle your dependencies as you wish (e.g.,pip
,virtualenv
,poetry
, etc.). - Uses
templater = dbt
- this requires a dummyprofiles.yml
and a connection to your data warehouse from the workflow. - Handles connecting to warehouse via VPN if your data warehouse requires it (optional). If your warehouse doesn't require connection via VPN, you can delete the
Install OpenVPN
andConnect to VPN
steps from the workflow. - Annotates failures on the PR, on the line where they occur.
NOTE: This workflow has been tested on Redshift. Config for Snowflake is included in this repo, but has not been tested. For a very similar workflow which has been tested on Snowflake, see the tails.com flow
We use the .sqlfluff
found here.
When sqlfluff
uses templater = dbt
, it is actually using the dbt compiler to compile your SQL before sqlfluff
lints it. When the dbt compiler, uh ... compiles... the SQL in your models containing macros like dbt_utils.star()
, it needs to connect and query the warehouse to get information about the table and columns referenced in the marco.
But for dbt
to connect to a data warehouse, it needs a profiles.yml
. It's a bad idea to commit your profiles.yml
to your repo since it has sensitive DB credentials, so instead create a dummy profiles.yml
in a ./ci_cd/
folder in your dbt repo, for Redshift (Snowflake version) this will look like:
# -- Redshift dummy - save this at `./ci_cd/profiles.yml` --
config:
send_anonymous_usage_stats: False
use_colors: True
<profile-from-dbt-project.yml>: # <--- Replace with profile value in your dbt_project.yml!
target: sqlfluff
outputs:
sqlfluff:
type: redshift
host: "{{ env_var('PROFILES_YML_HOST') }}" # <-- Set these environment variables in GitHub Secrets
port: "{{ env_var('PROFILES_YML_PORT') | int }}"
user: "{{ env_var('PROFILES_YML_USER') }}"
pass: "{{ env_var('PROFILES_YML_PASS') }}"
dbname: "{{ env_var('PROFILES_YML_DBNAME') }}"
schema: "{{ env_var('PROFILES_YML_SCHEMA') }}"
threads: 1
The credentials set in the dummy profiles.yml
use dbt's env_var
function to read in environmment variables that you need to set in your repo using GitHub Secrets. These evironment variables are then injected into the virtual machine running the GitHub workflow like:
# Set environment variables used throughout GitHub workflow
env:
DBT_PROFILES_DIR: /home/runner/work/${{ github.event.repository.name }}/${{ github.event.repository.name }}/ci_cd
# SPECIFY database connection credentials as env vars below.
# Env var values to be fetched from as GitHub Secrets.
# HIGHLY recommended you use a unique set of connection credentials for this worklfow alone.
# IF USING REDSHIFT, workflow will use these in dummy profiles.yml (else, ignored)
PROFILES_YML_HOST: ${{ secrets.PROFILES_YML_HOST }}
PROFILES_YML_PORT: ${{ secrets.PROFILES_YML_PORT }}
PROFILES_YML_USER: ${{ secrets.PROFILES_YML_USER }}
PROFILES_YML_PASS: ${{ secrets.PROFILES_YML_PASS }}
PROFILES_YML_DBNAME: ${{ secrets.PROFILES_YML_DBNAME }}
PROFILES_YML_SCHEMA: ${{ secrets.PROFILES_YML_SCHEMA }}
The exact secrets you need to configure in your Github repo are listed below.
Redshift:
PROFILES_YML_HOST
PROFILES_YML_PORT
PROFILES_YML_USER
PROFILES_YML_PASS
PROFILES_YML_DBNAME
PROFILES_YML_SCHEMA
Snowflake:
PROFILES_YML_SNOWFLAKE_ACCOUNT
PROFILES_YML_SNOWFLAKE_USER
PROFILES_YML_SNOWFLAKE_PASSWORD
PROFILES_YML_SNOWFLAKE_ROLE
PROFILES_YML_SNOWFLAKE_DATABASE
PROFILES_YML_SNOWFLAKE_WAREHOUSE
Now that the DB credentials are set in the dummy profiles.yml
, the when sqlfluff
uses `templater = dbt, it can connect to the warehouse!
If your warehouse requires connections to be made via a VPN, you will need the following credentials (ask your IT team about creating these):
- username (strongly recommend creating user just for this GitHub workflow)
- password
vpn_config.ovpn
You will set these as credentials as GitHub Secrets, which are used in the workflow.
OVPN_USERNAME
OVPN_PASSWORD
VPN_FILE
(the contents of your.ovpn
file).
Now the workflow can connect to your warehouse, again, required when sqlfluff
uses templater = dbt
.
- A similar approach for creating dummy
profiles.yml
for other databases (e.g., BigQuery, Presto, etc.) should be possible. - You will have to modify your dummy
profiles.yml
to match the format found in your localprofiles.yml
, typically found at~/.dbt/profiles.yml
. Or see the dbt docs for connection instructions for other database types. - Then set the connection credentials as environment variables in your
sqlfluff_lint_dbt_models.yml
. - Store the environment variables values as GitHub Secrets.