-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better node identification #3348
Comments
Thanks for this @iknox-fa! I'm tagging this There was a great thread over in Slack just now about what might make sense from a user's point of view. |
@jtcohen6 I see where you're coming from, but it's actually more about the Right now we have three+ ways to ID a node: If that is the case, uniform logic to handle *unless you want a Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft |
|
Let's do an ADR on this first and then scope: #3548 Also we need to talk to the metadata team to see if this would impact them |
This came up again while triaging #4684 and while doing some research for making a new ticket, I found this one. The rest of this comment is a proposal for an alternative identification strategy as it pertains specifically to tests: Goals
TodayThe way we do test identification today involves a "name mangling" strategy which is defined in core/dbt/parser/generic_test_builders.py::get_nice_generic_test_name. This strategy produces a string with the model name, test type, test name, test arguments, and a hash of the previous values to disambiguate different tests that may have elements that contain the delimiter in them. Implications:
Proposed ChangeComputing function equality is undecidable in the general case, and even though we have a smaller subset of functions to work with when it comes to dbt tests, mechanically deciding which tests are equivalent seems like the wrong approach. Instead, I propose allowing users to name tests if they would like to track tests between runs. For example:
I propose that this test name must be unique across each model which we can think of as namespaces for test definitions. This means our name mangling strategy can be simplified to a pairing of Since requiring the name field on all tests would be a difficult breaking change for large projects with many tests, we will need a mechanically identified name that can be deterministically built. I propose pulling from the previous solution and using the model, test type, and values to identify the test. However, to keep the identifier a manageable size even when there are many arguments the name mangling strategy can be reduced to Additionally, to encourage the best practice of naming your test, in a future major version of dbt we could issue a warning for each test definition which does not have an explicit name configured. Additional Context
|
@nathaniel-may Thank you again for this excellent comment. While thinking about this on the train, I realized that the code to fix this one is super straightforward and self-contained, so I gave it a go in #4898 |
Describe the feature
As proven by the necessity to add a hash to a test node's
unique_id
, the current model of node identification could use some tweaking.This was discussed in previous ticket comments:
#3335 (comment)
#3335 (comment)
Describe alternatives you've considered
This isn't well fleshed out yet, but the ideas that come immediately to mind are:
Who will this benefit?
This will make development simpler and eliminate the need for post-parsing checks like this.
Are you interested in contributing this feature?
The text was updated successfully, but these errors were encountered: