-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Allow multiple unique
tests for the same column
#4102
Comments
edit: accidentally closed this issue when I submitted my comment too early! Thanks for opening @ilmari-aalto! In the general sense, there's some overlap with #3348, which proposes a better approach to node name construction across the board. But node naming is particularly difficult for specific instances of generic tests, and particularly so when dealing with test configs. To boil down the big question, why is this ok? - name: my_column
tests:
- accepted_values:
values: ['abc', '123']
config:
where: is_alphanumeric
- accepted_values:
values: ['123']
config:
where: is_numeric When this isn't? - name: my_column
tests:
- accepted_values:
values: ['abc', '123']
config:
where: is_alphanumeric
- accepted_values:
values: ['abc', '123']
config:
where: is_numeric To make a long story short:
Every node in dbt must have a unique identifier ( resolution?To resolve this issue, we could:
1 We could also include configs when constructing the workaroundsIn the meantime, to work around this limitation, users can:
There's a lot of resonance between this issue and #4103, since both want the ability to treat conclusion (for now)For my part, I'm not opposed to ever making this change; I feel the jury is still out. There's a real trade-off here, in terms of losing the ability to map tests across invocations/configurations; it feels like a conceptually sound guardrail; and there exist some very reasonable workarounds to achieve the same desired behavior. At the same time, we may look back at this issue several months from now and feel like it's the most obvious and uncontroversial change in the world. If that's the case, we should make it. |
This is a rather annoying behaviour for us. We've just upgraded to 1.0 from 0.19 and had assumed that with the addition of the Test Config we'd be able to get rid of our custom recency test. However, due to this behaviour we cannot. One of our use cases looks like this: - recency:
datepart: day
field: date
interval: 2
where: "connection_type_code = 'tiktok'"
- recency:
datepart: day
field: date
interval: 2
where: "connection_type_code = 'twitter'"
- recency:
datepart: day
field: date
interval: 2
where: "connection_type_code = 'awin'"
- recency:
datepart: day
field: date
interval: 2
where: "connection_type_code = 'reddit'"
- recency:
datepart: day
field: date
interval: 2
where: "connection_type_code = 'pinterest'" The reason for defining the tests separately is to make it clear where the failure is coming from. For this case it seems pretty clear (to me at least) that these are different tests but they all collide due to not including the the config (or just the where clause) in the name. It would also be unhelpful to include the where in the hash but not in the name. We are currently using the workaround of a a custom test with @jtcohen6 do you see this case as any different to the original issue? Is this worth rethinking? It seems to me that the |
@judahrand Appreciate the comment! Rather than hoping to perfect the logic for auto-generating test names (a losing proposition), we think the longer-term fix might look like what's proposed in #3348 (comment) and attempted in #4898. Namely: - recency:
name: tiktok_recent_within_2_days
datepart: day
field: date
interval: 2
where: "connection_type_code = 'tiktok'"
- recency:
name: twitter_recent_within_2_days
datepart: day
field: date
interval: 2
where: "connection_type_code = 'twitter'"
- recency:
name: awin_recent_within_2_days
datepart: day
field: date
interval: 2
where: "connection_type_code = 'awin'"
- recency:
name: reddit_recent_within_2_days
datepart: day
field: date
interval: 2
where: "connection_type_code = 'reddit'"
- recency:
name: pinterest_recent_within_2_days
datepart: day
field: date
interval: 2
where: "connection_type_code = 'pinterest'" What do you think of that? |
@jtcohen6 Yeah, I think that makes sense and would solve our issue. Though my one concern is how obvious it would be from the current warning that the use can fix it by manually naming the tests. I suppose this could be solved either in the warning or in the docs? |
@jtcohen6 - based on a dbt Slack conversation I just realized that this was resolved by custom test names in dbt-labs/docs.getdbt.com#1269. A very neat solution to get past this problem! Unless you have something else pending, feel free to close this issue, or I can do it as well if you confirm it's OK? |
Yessir! Closing now |
Is there an existing feature request for this?
Describe the Feature
Enable more than one
unique
test for the same column.I think there are usecases where you'd like to have several
unique
tests. For example, here's an example related to MRR. In this example, there can only be one conversion record per account, and separately, just one current record per account:Describe alternatives you've considered
where
dbt_utils.unique_combination_of_columns
could work but the intention of that test would be less clearWho will this benefit?
Enabling more than one
unique
test per column would allow users combine existing tests in novel ways - users would have added flexibility with the same building blocks. A beginner dbt user will not even need to know of the possibility so it wouldn't need to confuse them.Are you interested in contributing this feature?
No response
Anything else?
From the technical point of view, multiple
unique
tests are currently not possible because config is not part of the test name, and all tests get the same name. Perhaps a test order number (1, 2, ..), hash, or test config could be added to the test name to work past this. (Perhaps extended test naming would only kick in when more than one test of the same type exists for the same column.)More discussion and examples in this Slack thread.
The text was updated successfully, but these errors were encountered: