Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken categorization in transform_dbt_test_results.py and remove dbt_utils dependency #474

Merged
merged 2 commits into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 15 additions & 19 deletions .github/scripts/transform_dbt_test_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,16 +108,13 @@
# Mapping that defines category names that should be reported for tests
# based on their generics
TEST_CATEGORIES = {
"macro.athena.test_accepted_range": "incorrect_values",
"macro.dbt_utils.test_accepted_range": "incorrect_values",
"macro.athena.test_accepted_values": "incorrect_values",
"macro.athena.test_not_accepted_values": "incorrect_values",
"macro.dbt_utils.test_not_accepted_values": "incorrect_values",
"macro.athena.test_unique_combination_of_columns": "duplicate_records",
"macro.dbt_utils.test_unique_combination_of_columns": "duplicate_records",
"macro.athena.test_not_null": "missing_values",
"macro.athena.test_is_null": "missing_values",
"macro.athena.test_res_class_matches_pardat": "class_mismatch_or_issue",
"test_accepted_range": "incorrect_values",
"test_accepted_values": "incorrect_values",
"test_not_accepted_values": "incorrect_values",
"test_unique_combination_of_columns": "duplicate_records",
"test_not_null": "missing_values",
"test_is_null": "missing_values",
"test_res_class_matches_pardat": "class_mismatch_or_issue",
Comment on lines +111 to +117
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list remains the same, I've just removed the macro.<project_name>.* prefix from each key and deduplicated the keys that previously had a dbt_utils version.

}
# Fallback for tests whose category we can't determine from either the
# test name, the `meta.category` attribute, or the TEST_CATEGORIES mapping
Expand Down Expand Up @@ -1184,16 +1181,15 @@ def get_category_from_node(node: typing.Dict) -> str:
return meta_category

for dependency_macro in node["depends_on"]["macros"]:
if custom_test_name := TEST_CATEGORIES.get(dependency_macro):
# Macro names in the DAG are fully qualified following the pattern
# `macro.<project_name>.<macro_name>`, so remove the prefix to extract
# just the name of the macro
cleaned_macro_name = dependency_macro.split(".")[-1]
if custom_test_name := TEST_CATEGORIES.get(cleaned_macro_name):
return custom_test_name
# Custom generic tests are always formatted like
# macro.dbt.test_<generic_name>
if dependency_macro.startswith("macro.athena.test_"):
return dependency_macro.split("macro.athena.test_")[-1]
# dbt_utils generic tests are always formatted like
# macro.dbt_utils.test_<generic_name>
if dependency_macro.startswith("macro.dbt_utils.test_"):
return dependency_macro.split("macro.dbt_utils.test_")[-1]
# Custom generic tests are always formatted like test_<generic_name>
if cleaned_macro_name.startswith("test_"):
return cleaned_macro_name.split("test_")[-1]

return DEFAULT_TEST_CATEGORY

Expand Down
4 changes: 0 additions & 4 deletions dbt/package-lock.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): If we end up needing other macros from dbt_utils, are we going to just vendor the specific macro or will we re-add the dep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right move is to vendor the macro, unless two specific conditions arise: A macro that is 1) particularly complex and 2) does not require the additional_select_columns parameter that we use for all of our generic tests. In that case, it might make sense to reintroduce the dependency; in the meantime, every macro we've needed from dbt_utils so far has either been simple enough that it warrants vendoring or does require additional_select_columns.

This file was deleted.

3 changes: 0 additions & 3 deletions dbt/packages.yml

This file was deleted.

2 changes: 1 addition & 1 deletion dbt/tests/generic/test_sequential_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
additional_select_columns=[]
) %}

{%- set previous_column_name = "previous_" ~ dbt_utils.slugify(column_name) -%}
{%- set previous_column_name = "previous_" ~ slugify(column_name) -%}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our version of slugify is basically identical to the dbt_utils version; the only differences are that we convert backslashes to underscores and we don't prepend underscores if the string starts with a number. Neither of those differences should matter in this case since our column names don't contain backslashes or start with numbers.

For reference, here's our version of the macro:

-- Variation of dbt_utils.slugify macro using strict snake_case
{% macro slugify(string) %}
{#- Lower case the string -#}
{% set string = string | lower %}
{#- Replace spaces, slashes, and hyphens with underscores -#}
{% set string = modules.re.sub("[ /-]+", "_", string) %}
{#- Only take letters, numbers, and hyphens -#}
{% set string = modules.re.sub("[^a-z0-9_]+", "", string) %}
{{ return(string) }}
{% endmacro %}

And here's the dbt utils version:

{% macro slugify(string) %}


{#- Lower case the string -#}
{% set string = string | lower %}
{#- Replace spaces and dashes with underscores -#}
{% set string = modules.re.sub('[ -]+', '_', string) %}
{#- Only take letters, numbers, and underscores -#}
{% set string = modules.re.sub('[^a-z0-9_]+', '', string) %}
{#- Prepends "_" if string begins with a number -#}
{% set string = modules.re.sub('^[0-9]', '_' + string[0], string) %}


{{ return(string) }}


{% endmacro %}


{%- if group_by_columns | length() > 0 -%}
{%- set select_gb_cols = group_by_columns | join(",") -%}
Expand Down
Loading