-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2137] [Feature] Cross-database implementation of is_distinct_from #6997
Comments
Thanks for opening @joellabes ! What kind of syntax are you imagining, something like this?
Would you imagine it being being paired with For the Redshift implementation (or any other non-conforming database), we'd want to be diligent with the implementation. The truth table here would be handy for integration tests. Side note: both of the SQL examples you gave would be awesome for a SQL linter to alert about! |
Yeah I think that sounds right to me!
I imagined that the user would BYON (bring your own negation)
As in it would say "hey did you know there is a better way to do this"? That would be very cool. |
Two follow-up questions:
|
Probably something like what Joel had in the first post:
|
No, but that's a 100% fair question before putting it into Core. I desperately hope that my code is no longer in the metrics package 😂 (@callum-mcdata?) I imagine negations would be
and joins would be
Yeah agreed with Owen! mo' brackets le' problems, as they say |
Names of the features within the standardPart 2 of the SQL standard (ISO/IEC 9075-2) added these IDs and names in SQL:1999 and SQL:2003, respectively ¹:
Should we flip it?All the examples we've looked at so far are actually for So I'm wondering, maybe we want to actually implement the equivalent of Pondering names
I can't say I fully grok it all, but I think the source of this issue we're trying to solve is that I'm guessing that referring to
Shortening the name is inspired by SQLite's Using Let's find some examplesLet's examine some examples from the wild to confirm/deny if
|
Oh dang - I would have loved this functionality about 6 weeks ago when I was thinking about how we could support The use case for the metrics package was specifically around joining the outputs of two unique metrics queries where the values within a particular column might be null. Users were running into issues when the null values of |
It could be nice to implement this macro to make it easy to express the logic in cases like dbt-labs/dbt-adapters#159, dbt snapshots, etc. Refinement needed # 1The biggest outstanding decision is naming. Pros: Cons: I'm unaware of the strongest arguments against the simple & intuitive naming, but would welcome that insight! Summary: I'm attracted to something simple with clear intent like Refinement needed # 2The other outstanding decision is whether to implement both the NULL-aware comparisons of equality and inequality or just equality. (I'm thinking we wouldn't want to do only inequality as originally requested because the SQL is quite a bit more complicated and equality might be more common.) If we support both, then the syntax and rendered SQL would be like the following:
Versus if we only do equality, then inequality would need to be expressed like this:
Pros: Doing both might make them easier to discover and use. The implementation of not equals follows trivially from equals. Opportunity for adapters to optimize both implementations if needed. Cons: Two different macros to maintain, document, and test. Two different ways for the user to express inequality. Summary: As long as there is a macro for expressing equality, it doesn't seem necessary that we offer a macro for inequality. |
Question 1I am in two minds:
The good news is that my friends at Snowflake have split the difference, with what I assume is a wrapper over the top of So I would vote for either that, or something like Question 2I agree with you, I don't think we need both - people can prefix a |
NamingMost frequently, you'll find me on "team standards" which would have me throwing my weight behind After thinking on this quite a bit, I'd expect it to be rare for people to be surprised about its behavior. Rather, it's simple enough to be used without reading or consulting the documentation. Can't imagine a sweeter spot than that. ImplementationDue to the trickiness of SQL's three-valued logic, the implementation needs to be like this:
Since postgres supports Examplewith x as (
select cast(1 as integer) as i union all
select cast(2 as integer) as i union all
select cast(null as integer) as i
),
permutations as (
select
x1.i as x1_i,
x2.i as x2_i
from x as x1, x as x2
order by x1.i, x2.i
)
select
-- values to compare
x1_i,
x2_i,
-- logic for equals macro
x1_i = x2_i or (x1_i is null and x2_i is null) as equals_logic,
-- compliant syntax for null-safe equality
x1_i is not distinct from x2_i as is_not_distinct_from,
-- case/when logic for equals macro
case when x1_i = x2_i or (x1_i is null and x2_i is null)
then 0
else 1
end = 0 as case_equals_macro,
-- negation of logic for equals macro
not (x1_i = x2_i or (x1_i is null and x2_i is null)) as not_equals_logic,
-- compliant syntax for null-safe inequality
x1_i is distinct from x2_i as is_distinct_from,
-- negation of case/when logic for equals macro
not case when x1_i = x2_i or (x1_i is null and x2_i is null)
then 0
else 1
end = 0 as case_equals_macro
from permutations
; |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
ImplementationIn #7776, we implemented a null-safe equals macro for the purposes of supporting test cases: dbt-core/tests/adapter/dbt/tests/adapter/utils/base_utils.py Lines 5 to 10 in 5488dfb
Its logic is tested here (which uses these fixtures). It is only used within functional tests in pytest; it is not currently available to the adapter outside of the pytest environment. We could resolve this ticket by: |
Is this your first time submitting a feature request?
Describe the feature
If I had a dollar for every time I had done
or
,
I would have a lot of money.
Turns out that there is a solution for this in the sql standard!
We should add this as a cross-db macro - a speedy Google looks like most of our key adapters support it natively except for Redshift
Describe alternatives you've considered
Doing nothing.
Who will this benefit?
This seems useful for two reasons:
Are you interested in contributing this feature?
Yep! But it'll be my first cross-adapter contribution so I'll need some handholding
Anything else?
No response
The text was updated successfully, but these errors were encountered: