-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add detailed docs for dbt testing #432
Conversation
CUSTOM_TEST_NAMES = { | ||
# Mapping that defines category names that should be reported for tests | ||
# based on their generics | ||
TEST_CATEGORIES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable previously had a confusing name, and I think TEST_CATEGORIES
is clearer.
# Tests without a config.meta.category property will be grouped in | ||
# this default category | ||
DEFAULT_TEST_CATEGORY = "miscellaneous" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't deleted this variable, just moved it below to be closer to the other category-definition constant.
[is coming soon](https://docs.getdbt.com/docs/build/unit-tests). Until unit | ||
tests are natively supported, however, we do not have a way of implementing them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not entirely true -- we have a couple of unit tests so far, although they are implemented ad-hoc and without a clear framework for running them. I think we should leave this quirk undocumented for now, though, and revisit these docs (as well as our tests) once we implement a true unit test suite in #382.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we kind of have a third set of tests outside the unit/data distinction? e.g.
data-architecture/dbt/models/reporting/schema.yml
Lines 5 to 15 in 48c7178
tables: | |
- name: ratio_stats | |
description: '{{ doc("table_ratio_stats") }}' | |
tests: | |
- expression_is_true: | |
name: reporting_ratio_stats_no_nulls | |
expression: | | |
year IS NOT NULL | |
AND triad IS NOT NULL | |
AND geography_type IS NOT NULL | |
AND property_group IS NOT NULL |
That test is on a table that's downstream of transformations, but is not testing the transformation logic itself. So I suppose we'd count it as a unit test, but it doesn't actually match dot's upcoming unit test framework.
In my mind, these sorts of tests are still useful as they can catch logic errors when messing with transformations in models. They basically only get run when changing the view via a branch.
Edit: I am dumb and should have read further. These would be the "Non-QC test" defined below. IMO these might be worth breaking out into their own category, maybe "Model tests" or "Transformation tests". I feel like that's conceptually clearer, but open to ideas!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, there are a few subtypes of "non-QC tests" that are all getting conflated in the current version of the docs. Still, I think we should probably just table this section of the docs for now and revisit it as part of #382 since I expect the way we write this kind of test will change once unit testing becomes available (hopefully this week 🤞🏻 ). Does that sound OK to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good! Can you just add a note in the issue that we need to clarify these distinctions in the docs once unit tests hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done!
- count_is_consistent: | ||
name: default_vw_pin_universe_class_count_is_consistent_by_year | ||
group_column: year | ||
config: | ||
error_if: ">25" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are not new, I've simply moved them from table-level tests to the columns that they are operating on.
@@ -343,7 +343,7 @@ sources: | |||
description: adrno should be <= 5 characters long | |||
- column_length: | |||
name: iasworld_legdat_adrsuf_zip1_taxdist_length_lte_5 | |||
columns: | |||
column_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted a few tests to standardize around the parameter names column_name
and column_names
where appropriate, so a few tests needed to be adjusted to reflect this change.
## Available generic tests | ||
|
||
- [`test_accepted_range`](#test_accepted_range) | ||
- [`test_accepted_values`](#test_accepted_values) | ||
- [`test_column_is_subset_of_external_column`](#test_column_is_subset_of_external_column) | ||
- [`test_column_length`](#test_column_length) | ||
- [`test_columns_match`](#test_columns_match) | ||
- [`test_count_is_consistent`](#test_count_is_consistent) | ||
- [`test_expression_is_true`](#test_expression_is_true) | ||
- [`test_is_null`](#test_is_null) | ||
- [`test_no_extra_whitespace`](#test_no_extra_whitespace) | ||
- [`test_not_accepted_values`](#test_not_accepted_values) | ||
- [`test_not_null`](#test_not_null) | ||
- [`test_relationships`](#test_relationships) | ||
- [`test_res_class_matches_pardat`](#test_res_class_matches_pardat) | ||
- [`test_row_count`](#test_row_count) | ||
- [`test_row_values_match_after_join`](#test_row_values_match_after_join) | ||
- [`test_sequential_values`](#test_sequential_values) | ||
- [`test_unique_combination_of_columns`](#test_unique_combination_of_columns) | ||
- [`test_value_is_present`](#test_value_is_present) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could use some form of auto-documenter to generate this documentation based on the test docstrings, a la pkgdown. However, I think it's not worth spending too much time on that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: This is awesome and a lot of work! Very nice job. My only suggestion is to make it more prominent in the docs, perhaps by linking it under the Table of Contents and within the dbt catalog docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link to the README in ea8fe36! I'm not totally sure where it would fit in the data catalog docs, do you have a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demand (blocking): This needs a better emoji before merge lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, what do you think about 🧪?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, done in e3ba436!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @jeancochrane! Nice work. See comments for some optional changes, otherwise this is good to go.
[is coming soon](https://docs.getdbt.com/docs/build/unit-tests). Until unit | ||
tests are natively supported, however, we do not have a way of implementing them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we kind of have a third set of tests outside the unit/data distinction? e.g.
data-architecture/dbt/models/reporting/schema.yml
Lines 5 to 15 in 48c7178
tables: | |
- name: ratio_stats | |
description: '{{ doc("table_ratio_stats") }}' | |
tests: | |
- expression_is_true: | |
name: reporting_ratio_stats_no_nulls | |
expression: | | |
year IS NOT NULL | |
AND triad IS NOT NULL | |
AND geography_type IS NOT NULL | |
AND property_group IS NOT NULL |
That test is on a table that's downstream of transformations, but is not testing the transformation logic itself. So I suppose we'd count it as a unit test, but it doesn't actually match dot's upcoming unit test framework.
In my mind, these sorts of tests are still useful as they can catch logic errors when messing with transformations in models. They basically only get run when changing the view via a branch.
Edit: I am dumb and should have read further. These would be the "Non-QC test" defined below. IMO these might be worth breaking out into their own category, maybe "Model tests" or "Transformation tests". I feel like that's conceptually clearer, but open to ideas!
## Available generic tests | ||
|
||
- [`test_accepted_range`](#test_accepted_range) | ||
- [`test_accepted_values`](#test_accepted_values) | ||
- [`test_column_is_subset_of_external_column`](#test_column_is_subset_of_external_column) | ||
- [`test_column_length`](#test_column_length) | ||
- [`test_columns_match`](#test_columns_match) | ||
- [`test_count_is_consistent`](#test_count_is_consistent) | ||
- [`test_expression_is_true`](#test_expression_is_true) | ||
- [`test_is_null`](#test_is_null) | ||
- [`test_no_extra_whitespace`](#test_no_extra_whitespace) | ||
- [`test_not_accepted_values`](#test_not_accepted_values) | ||
- [`test_not_null`](#test_not_null) | ||
- [`test_relationships`](#test_relationships) | ||
- [`test_res_class_matches_pardat`](#test_res_class_matches_pardat) | ||
- [`test_row_count`](#test_row_count) | ||
- [`test_row_values_match_after_join`](#test_row_values_match_after_join) | ||
- [`test_sequential_values`](#test_sequential_values) | ||
- [`test_unique_combination_of_columns`](#test_unique_combination_of_columns) | ||
- [`test_value_is_present`](#test_value_is_present) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: This is awesome and a lot of work! Very nice job. My only suggestion is to make it more prominent in the docs, perhaps by linking it under the Table of Contents and within the dbt catalog docs.
This PR updates our dbt documentation to included detailed docs on writing and running tests.
Connects #385.