Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detailed docs for dbt testing #432

Merged
merged 16 commits into from
May 7, 2024

Conversation

jeancochrane
Copy link
Contributor

This PR updates our dbt documentation to included detailed docs on writing and running tests.

Connects #385.

CUSTOM_TEST_NAMES = {
# Mapping that defines category names that should be reported for tests
# based on their generics
TEST_CATEGORIES = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable previously had a confusing name, and I think TEST_CATEGORIES is clearer.

Comment on lines -89 to -91
# Tests without a config.meta.category property will be grouped in
# this default category
DEFAULT_TEST_CATEGORY = "miscellaneous"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't deleted this variable, just moved it below to be closer to the other category-definition constant.

Comment on lines +438 to +439
[is coming soon](https://docs.getdbt.com/docs/build/unit-tests). Until unit
tests are natively supported, however, we do not have a way of implementing them.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not entirely true -- we have a couple of unit tests so far, although they are implemented ad-hoc and without a clear framework for running them. I think we should leave this quirk undocumented for now, though, and revisit these docs (as well as our tests) once we implement a true unit test suite in #382.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we kind of have a third set of tests outside the unit/data distinction? e.g.

tables:
- name: ratio_stats
description: '{{ doc("table_ratio_stats") }}'
tests:
- expression_is_true:
name: reporting_ratio_stats_no_nulls
expression: |
year IS NOT NULL
AND triad IS NOT NULL
AND geography_type IS NOT NULL
AND property_group IS NOT NULL

That test is on a table that's downstream of transformations, but is not testing the transformation logic itself. So I suppose we'd count it as a unit test, but it doesn't actually match dot's upcoming unit test framework.

In my mind, these sorts of tests are still useful as they can catch logic errors when messing with transformations in models. They basically only get run when changing the view via a branch.

Edit: I am dumb and should have read further. These would be the "Non-QC test" defined below. IMO these might be worth breaking out into their own category, maybe "Model tests" or "Transformation tests". I feel like that's conceptually clearer, but open to ideas!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, there are a few subtypes of "non-QC tests" that are all getting conflated in the current version of the docs. Still, I think we should probably just table this section of the docs for now and revisit it as part of #382 since I expect the way we write this kind of test will change once unit testing becomes available (hopefully this week 🤞🏻 ). Does that sound OK to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good! Can you just add a note in the issue that we need to clarify these distinctions in the docs once unit tests hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done!

Comment on lines +85 to +89
- count_is_consistent:
name: default_vw_pin_universe_class_count_is_consistent_by_year
group_column: year
config:
error_if: ">25"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are not new, I've simply moved them from table-level tests to the columns that they are operating on.

@@ -343,7 +343,7 @@ sources:
description: adrno should be <= 5 characters long
- column_length:
name: iasworld_legdat_adrsuf_zip1_taxdist_length_lte_5
columns:
column_names:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted a few tests to standardize around the parameter names column_name and column_names where appropriate, so a few tests needed to be adjusted to reflect this change.

Comment on lines +7 to +26
## Available generic tests

- [`test_accepted_range`](#test_accepted_range)
- [`test_accepted_values`](#test_accepted_values)
- [`test_column_is_subset_of_external_column`](#test_column_is_subset_of_external_column)
- [`test_column_length`](#test_column_length)
- [`test_columns_match`](#test_columns_match)
- [`test_count_is_consistent`](#test_count_is_consistent)
- [`test_expression_is_true`](#test_expression_is_true)
- [`test_is_null`](#test_is_null)
- [`test_no_extra_whitespace`](#test_no_extra_whitespace)
- [`test_not_accepted_values`](#test_not_accepted_values)
- [`test_not_null`](#test_not_null)
- [`test_relationships`](#test_relationships)
- [`test_res_class_matches_pardat`](#test_res_class_matches_pardat)
- [`test_row_count`](#test_row_count)
- [`test_row_values_match_after_join`](#test_row_values_match_after_join)
- [`test_sequential_values`](#test_sequential_values)
- [`test_unique_combination_of_columns`](#test_unique_combination_of_columns)
- [`test_value_is_present`](#test_value_is_present)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could use some form of auto-documenter to generate this documentation based on the test docstrings, a la pkgdown. However, I think it's not worth spending too much time on that right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This is awesome and a lot of work! Very nice job. My only suggestion is to make it more prominent in the docs, perhaps by linking it under the Table of Contents and within the dbt catalog docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link to the README in ea8fe36! I'm not totally sure where it would fit in the data catalog docs, do you have a suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demand (blocking): This needs a better emoji before merge lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, what do you think about 🧪?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, done in e3ba436!

@jeancochrane jeancochrane marked this pull request as ready for review May 6, 2024 22:17
@jeancochrane jeancochrane requested a review from a team as a code owner May 6, 2024 22:17
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @jeancochrane! Nice work. See comments for some optional changes, otherwise this is good to go.

Comment on lines +438 to +439
[is coming soon](https://docs.getdbt.com/docs/build/unit-tests). Until unit
tests are natively supported, however, we do not have a way of implementing them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we kind of have a third set of tests outside the unit/data distinction? e.g.

tables:
- name: ratio_stats
description: '{{ doc("table_ratio_stats") }}'
tests:
- expression_is_true:
name: reporting_ratio_stats_no_nulls
expression: |
year IS NOT NULL
AND triad IS NOT NULL
AND geography_type IS NOT NULL
AND property_group IS NOT NULL

That test is on a table that's downstream of transformations, but is not testing the transformation logic itself. So I suppose we'd count it as a unit test, but it doesn't actually match dot's upcoming unit test framework.

In my mind, these sorts of tests are still useful as they can catch logic errors when messing with transformations in models. They basically only get run when changing the view via a branch.

Edit: I am dumb and should have read further. These would be the "Non-QC test" defined below. IMO these might be worth breaking out into their own category, maybe "Model tests" or "Transformation tests". I feel like that's conceptually clearer, but open to ideas!

Comment on lines +7 to +26
## Available generic tests

- [`test_accepted_range`](#test_accepted_range)
- [`test_accepted_values`](#test_accepted_values)
- [`test_column_is_subset_of_external_column`](#test_column_is_subset_of_external_column)
- [`test_column_length`](#test_column_length)
- [`test_columns_match`](#test_columns_match)
- [`test_count_is_consistent`](#test_count_is_consistent)
- [`test_expression_is_true`](#test_expression_is_true)
- [`test_is_null`](#test_is_null)
- [`test_no_extra_whitespace`](#test_no_extra_whitespace)
- [`test_not_accepted_values`](#test_not_accepted_values)
- [`test_not_null`](#test_not_null)
- [`test_relationships`](#test_relationships)
- [`test_res_class_matches_pardat`](#test_res_class_matches_pardat)
- [`test_row_count`](#test_row_count)
- [`test_row_values_match_after_join`](#test_row_values_match_after_join)
- [`test_sequential_values`](#test_sequential_values)
- [`test_unique_combination_of_columns`](#test_unique_combination_of_columns)
- [`test_value_is_present`](#test_value_is_present)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This is awesome and a lot of work! Very nice job. My only suggestion is to make it more prominent in the docs, perhaps by linking it under the Table of Contents and within the dbt catalog docs.

@jeancochrane jeancochrane merged commit af87014 into master May 7, 2024
10 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/add-dbt-testing-docs branch May 7, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants