-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support descriptions on tests #3302
Conversation
bd6993e
to
caaed84
Compare
caaed84
to
cd12919
Compare
198db83
to
f227ee9
Compare
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
model = [] | ||
|
||
if len(node.refs): | ||
model = node.refs[0] | ||
elif len(node.sources): | ||
model = node.sources[0] | ||
|
||
if len(model) == 1: | ||
target_model_name = model[0] | ||
elif len(model) == 2: | ||
_, target_model_name = model | ||
else: | ||
raise dbt.exceptions.InternalException( | ||
f'Refs and sources should always be 1 or 2 arguments - got {len(model)}' | ||
) | ||
|
||
ctx['column_name'] = node.column_name | ||
ctx['model'] = target_model_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blah hardcoded values set directly on the context, I tried to avoid this without success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that model
and column_name
are the only additions to the test description rendering context, right? I think that's totally ok for the v1 of this! In the future, it would be neat if we could grab other args
and include them in the rendering context, but that should 100% not block us from merging this as an amazing first cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see your comment above. Okay, that makes sense!
# between names and node ids? | ||
patch_keys = set(self.patches.keys()) | ||
used_patch_keys = set() | ||
|
||
for node in self.nodes.values(): | ||
patch = self.patches.pop(node.name, None) | ||
patch_lookup_key = node.patch_lookup_key | ||
patch = self.patches.get(patch_lookup_key, None) | ||
if not patch: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new/diverging behavior! Previously, a patch can only be applied once to a single node. The new behavior allows patches to be applied to multiple nodes to support applying the same patch to multiple instances of the same generic test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wonderful!
Some follow-on issues that should come from here:
- Per the comment here, we should including the test
description
(if available) in the info/stdout logging when a test fails. - Let's update the builtin generic tests to have nice descriptions! I'd be happy to take a swing at this.
All in all, the work here LGTM.
model = [] | ||
|
||
if len(node.refs): | ||
model = node.refs[0] | ||
elif len(node.sources): | ||
model = node.sources[0] | ||
|
||
if len(model) == 1: | ||
target_model_name = model[0] | ||
elif len(model) == 2: | ||
_, target_model_name = model | ||
else: | ||
raise dbt.exceptions.InternalException( | ||
f'Refs and sources should always be 1 or 2 arguments - got {len(model)}' | ||
) | ||
|
||
ctx['column_name'] = node.column_name | ||
ctx['model'] = target_model_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that model
and column_name
are the only additions to the test description rendering context, right? I think that's totally ok for the v1 of this! In the future, it would be neat if we could grab other args
and include them in the rendering context, but that should 100% not block us from merging this as an amazing first cut.
model = [] | ||
|
||
if len(node.refs): | ||
model = node.refs[0] | ||
elif len(node.sources): | ||
model = node.sources[0] | ||
|
||
if len(model) == 1: | ||
target_model_name = model[0] | ||
elif len(model) == 2: | ||
_, target_model_name = model | ||
else: | ||
raise dbt.exceptions.InternalException( | ||
f'Refs and sources should always be 1 or 2 arguments - got {len(model)}' | ||
) | ||
|
||
ctx['column_name'] = node.column_name | ||
ctx['model'] = target_model_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see your comment above. Okay, that makes sense!
@@ -46,7 +46,8 @@ def documentable(cls) -> List['NodeType']: | |||
cls.Source, | |||
cls.Macro, | |||
cls.Analysis, | |||
cls.Exposure | |||
cls.Exposure, | |||
cls.Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that both types of tests (generic + bespoke) can use doc
blocks in their descriptions, nice!
I think we should rethink this. Applying patches to common tests like 'unique' and 'not_null' would be a big performance hit in large projects. The 'process_docs' step, which is already pretty slow, would bloat quite a bit. In addition it would be very hard to handle with the new partial processing. If somebody changed one of these test patches we would have to delete every test node that depends on the that test, and every node, exposure, and source that that generated those tests, i.e. for common tests like 'unique' and 'not_null' most of the project. Probably that wouldn't happen too often, so maybe we could just refuse to partially parse any changes to test patches, but that's not optimal. I'm wondering if this is a feature where we should do some of the deferred rendering that we've talked about occasionally. Put the unrendered description, etc, on the macro test nodes, like we do for macro_patches, and then only render those bits when a test is executed. This would remove the large parsing-time rendering overhead and would be easy to handle with partial parsing. But we'd have to write new code to render those parts at run time. |
Something else to think about, if we only allow 'column_name' and 'model' in those descriptions, is not doing jinja rendering but simple a python string replace, i.e. replace {{ model}} with the model name (and the same for the column_name). It's not consistent with the way we handle other attributes, but it would be a lot faster. |
@gshank I really appreciate you raising the performance concern here! It's tremendously justified. Let's figure out a way to achieve the functionality we want without bloating I think we can take similar "shortcuts" for builtin I don't see it being a common occurrence for users to edit generic test properties in development: most tests come from within dbt or within packages; a user may define a handful of custom generic tests in their own project, but those tests are highly unlikely to have a footprint on the same scale as |
# Idealing we should be able to put the test kwargs in the context, | ||
# but kwargs have already been processed at this point | ||
# ex: 'model' --> "{{ ref('model') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my understanding: The test_metadata
that lives on the test node, and contains kwargs
as well, is not Jinja-rendered. E.g. from the representation in manifest.json
:
"test_metadata": {
"name": "accepted_values",
"kwargs": {
"values": [
"usd"
],
"column_name": "currency_code",
"model": "{{ ref('stg_ticket_tailor__orders') }}"
},
"namespace": null
},
In order to include kwargs
in the test description, we'd need to upgrade the description
renderer to use the full Jinja rendering context (refs, macros, all that). Is that right?
After considering the performance impacts of supporting test descriptions with this approach, and the concerns raised in the comments above, we are tabling this feature until we have a better approach for description rendering and dependency resolution for partial parsing 👍 |
resolves #3249
Description
This PR uses the same patching mechanism to apply updates to test nodes after parsing test properties. This currently supports descriptions for generic tests by applying the description to all nodes representing an instance of the test. When rendering the description for the test node, we expose the column name and model the test is running on in the context (
column_name
andmodel
). This also supports doc blocks for both generic and bespoke test descriptions. yaml files can be defined in the test directory to achieve this.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.