✅ Robust Data Quality Checks
dbt-assertions
ensures thorough data quality assessments at the row level,
enhancing the reliability of downstream models.
🔍 Efficient Exception Detection
Granular row-by-row exception detection identifies and flags specific rows that fails assertions, streamlining the resolution process.
🛠️ Customizable Assertions & Easy Integration
Easy-to-use macros assertions()
and assertions_filter()
empower users to
customize without barriers data quality checks within the model YAML definition,
adapting to specific data validation needs.
🚀 An Easy Shift from your Actual Workflows
A generic test generic_assertions()
to perform dbt tests as usual,
testing the package easily without compromising your current workflows.
you can test the package with this generic test easily without having to rebuild you table
- Features
- Content
- Install
- Supported databases
- Dependencies
- Variables
- Basic Example
- Documentation
- Contribution
- Acknowledgments
- Contact
dbt-assertions
currently supports dbt 1.7.x
or higher.
Check dbt package hub for the latest installation instructions, or read the docs for more information on installing packages.
Include in packages.yml
packages:
- package: AxelThevenot/dbt_assertions
version: [">=1.0.0", "<2.0.0"]
# <see https://github.com/AxelThevenot/dbt-assertions/releases/latest> for the latest version tag
Database | assertions | assertions_filter | __unique__ | __not_null__ | generic_assertions |
---|---|---|---|---|---|
BigQuery (default) | ✅ | ✅ | ✅ | ✅ | ✅ |
Snowflake | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
DuckDB | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
Databricks | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
Redshift | ✅ | ✅ (Include/Exclude Fields Not Supported) |
✅ (not nested fields) |
✅ (not nested fields) |
✅ |
Others | ℹ️ | ℹ️ | ℹ️ | ℹ️ | ℹ️ |
- ✅: supported
- ✔️: supported not tested
- ❌: not supported
- ℹ️: Opened to contributions ❤️ (see the integration_tests folder on how to test your adapter)
For latest release, see https://github.com/AxelThevenot/dbt-assertions/releases
This package do not have dependencies.
The following variable can be defined in your dbt_project.yml
file to change the default exceptions
column name used by the package.
vars:
"dbt_assertions:default_column": "exceptions"
Check the basic_example example.
assertions()
macro generates a select expression for row-level assertions.
Arguments:
- column (optional[str]): column to read the assertions from.
This macro parses the schema model YAML to extract row-level assertions; custom assertions, unique, and not-null. It then constructs an array of exceptions for each row based on its assertions results.
By default, it will generate assertions based on
your YAML model definition
reading configuration for a column named exceptions
.
You can call the macro using column
argument to change this default column.
SELECT
*,
{{ dbt_assertions.assertions(column='warnings') }},
FROM {{ ref('my_model') }}
Note: this macro is made to generate assertions based on you query result. It means it must be generated at the end of the query.
WITH
[...] -- Other CTEs
final AS (
SELECT
[...]
FROM {{ ref('my_model') }}
)
-- After query results
SELECT
*,
{{ dbt_assertions.assertions() }},
FROM final
assertions_filter()
macro generates an expression to filter rows based on
assertions results, generated with the assertions()
macro.
Arguments:
- column (optional[str]): Column to read the exceptions from.
- exclude_list (optional[list[str]]): Assertions to exclude in the filter.
- include_list (optional[list[str]]): Assertions to include in the filter.
- reverse (optional[bool]): returns rows without exception when
reverse=false
, and rows with exceptions whenreverse=true
.
By default, each row with exception(s) will be filtered.
SELECT
*
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter() }}
You can change this default behaviour specifying an
optional exclude_list
or include_list
argument (not both).
SELECT
*
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter(exclude_list=['assertions_id']) }}
Generates a test to get rows based on exceptions.
It will returns the rows without any exception by default. You can change this default behaviour specifying a exclude_list or include_list (not both).
You must defined beforehand the assertions for the model. More on YAML definition for assertions.
Arguments:
- column (optional[str]): Column to read the exceptions from.
- exclude_list (optional[list[str]]): Assertions to exclude in the filter.
- include_list (optional[list[str]]): Assertions to include in the filter.
- re_assert (optional[bool]): to set to
true
if your assertion field is not calculated in your table.
Configure the generic test in schema.yml with:
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
[column: <column_name>]
[exclude_list: <list(str_to_filter)>]
[include_list: <list(str_to_filter)>]
[re_assert: true | false]
columns:
...
[]
represents optional parts. Yes everything is optional but let's see it by examples.
In the basic test example
you can easily create your test as follows then run your dbt test
command.
models:
- name: basic_test_example_d_site
tests:
- dbt_assertions.generic_assertions:
column: exceptions
include_list:
- site_id_is_not_null
# `re_assert: true` to use only if your assertion's column
# is not computed and saved in your table.
re_assert: true
columns:
...
The assertions definition must be created under a column definition of your model and respects the following.
assertions:
[__unique__: <unique_expression>]
[__not_null__: __unique__ | <not_null_expression>]
[<custom_assertion_id>:
description: [<string>]
expression: <string>
null_as_exception: [<bool>]]
...
[]
represents optional parts.
Yes, everything is optional but let's see it by examples.
Custom assertions are the basics assertions.
The package is made to support every assertions as long as it is supported in a SELECT statement of your underlying database. So you can do a lot of things.
It is represented as key values. Keys are the ID of the assertions.
Each assertions is defined by at least an expression
which will be rendered
to be evaluated as your test.
description
and null_as_exception
are optional.
assertions:
unique:
description: "Row must be unique."
expression: "1 = COUNT(1) OVER(PARTITION by my_id)"
site_creation_date_is_past:
description: "Site must be created in the past."
expression: "site_creation_date <= CURRENT_DATE()"
site_type_pick_list:
description: "The site type be must in its known picklist."
expression: |
site_type IN (
'store',
'ecommerce',
'drive',
'pickup'
)
null_as_exception
is an optional configuration for your assertion.
Default to false
it is the return result if your expression is evaluated to NULL
.
Default behaviour is set to false
because one assertion must evaluate on thing.
Prefer using the __not_null_
helper instead.
In our previous, if we want to also avoid NULL
site types.
assertions:
...
site_type_pick_list:
description: "The site type be must in its known picklist."
expression: |
site_type IN (
'store',
'ecommerce',
'drive',
'pickup'
)
null_as_exception: true
As guaranteeing uniqueness of rows is a concern in most of the use cases,
the __unique__
helper is here to avoid writing complex and repetitive queries.
assertions:
unique:
description: "Row must be unique."
expression: "1 = COUNT(1) OVER(PARTITION by key_1, key_2)"
The above configuration is the same as writing
assertions:
__unique__:
- key_1
- key_2
You can also verify unique keys for nested/repeated structure. It will generate:
- One assertion for the 0-depth guaranteeing uniqueness across the rows.
- One assertion for each repeated field guaranteeing uniqueness within the row.
The following example will generate the assertions:
unique
: Row must be unique over the unique keys.nested_1__unique
: Items must be unique within nested_1 in the row.nested_1.nested_2__unique
: Items must be unique within nested_1.nested_2 in the row.
assertions:
__unique__:
- key_1
- key_2
- nested_1:
- key_3
- key_4
- nested_2:
- key_5
- key_6
As guaranteeing not null values is also concern in most of the use cases,
the __not_null__
helper is here to avoid writing complex and repetitive queries.
assertions:
key_1__not_null:
description: "key_1 is not null."
expression: "key_1 IS NOT NULL"
key_2__not_null:
description: "key_2 is not null."
expression: "key_2 IS NOT NULL"
The above configuration is the same as writing
assertions:
__unique__:
- key_1
- key_2
__not_null__:
- key_1
- key_2
And as the two helpers are often linked, you can rewrite the assertions as follows, which is also the same.
assertions:
__unique__:
- key_1
- key_2
__not_null__: __unique__
You can also verify unique keys for nested/repeated structure. It will generate:
- One assertion for each column of the 0-depth guaranteeing not null.
- One assertion for each column under the repeated field guaranteeing all the values are not null within the row.
The following example will generate the assertions:
key_1__not_null
: key_1 is not null.key_2__not_null
: key_2 is not null.nested_1.key_3__not_null
: nested_1.key_3 are not null.
assertions:
__not_null__:
- key_1
- key_2
- nested_1:
- key_3
If exceptions
column is not a naming convention you like,
you can still opt for a column name you choose and the macro will
still work with the from_colum
argument.
You can also play with multiple columns.
model:
name: my_model
columns:
...
- errors:
assertions:
__unique__:
- key_1
- key_2
__not_null__: __unique__
- warns:
assertions:
site_creation_date_is_past:
description: "Site must be created in the past."
expression: "site_creation_date <= CURRENT_DATE()"
And in your model query.
WITH final AS
(
SELECT ...
)
SELECT
*,
{{ dbt_assertions.assertions(column='errors') }},
{{ dbt_assertions.assertions(column='warns') }}
FROM {{ ref('my_model') }}
If you want to contribute, please open a Pull Request or an Issue on this repo. Feel free to reach me Linkedin.
Special thank to
- Victor Vaneecloo for the generic test
- Guillaume Blaquiere for its help in terminology
- Daniel Bartley for its help in terminology
- Benoit Perigaud for the Snowflake support
- Ferdy Hulzebos for the Databricks support
- Rich Herman for the Redshift support
If you have any question, please open a new Issue or feel free to reach out to Linkedin
Happy coding with dbt-assertions!