-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement group_by_columns
argument for relevant tests
#633
Conversation
YAY. Thanks @emilyriederer - I will dig into this probably next week? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really elegant. I kept starting to write comments along the lines of "if you did XXX instead it would be a bit tidier", and then as I wrote it I realised that it was actually covering like 3 different edge cases that my proposal wouldn't have.
I've got one nitpick around comma structure and one pattern where I think we could end up in a column name conflict, but otherwise this is magnifique 🤩
Thanks for reviewing, @joellabes ! I fixed the simpler comma issue and outlined a few options for the other. Since it feels like we are close, I'll ask one bigger picture question: Is there any better way I should document these changes? This felt like too small of a feature to note on the |
@emilyriederer thanks for doing the research! have replied on that comment above.
Nah I think this is a big deal! To save a ton of repeated documentation, I would recommend documenting it all once at the list of the generic tests, and then saying something like "This test also supports the |
Thanks @joellabes ! I've added that section to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sooooo close 😍 thanks for sticking with it! These two changes are the only things I can see holding it back; I would just commit them myself to save you a job but want to check that I've understood them properly!
Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @joellabes ! Excited to have this merged and to start using 🤓 |
This reverts commit ed47585.
I just reverted it as I'll put it onto the 1.0 branch, not |
(as with all git installs, you'll be able to install that branch directly if you want to get ahead of the curve though!) |
Totally makes sense - thanks for the headsup! |
Appreciate it! Thanks! |
Hi @mgcdanny - this feature extends the following checks:
Those all work the same as in dbt-utils, but they are now assessed separately for each group provided. None of these compare between groups, so I don't believe any could accomplish your first question. However, I believe the second can be implement with |
This is a:
All pull requests from community contributors should target the
main
branch (default).Description & motivation
Description
This PR closes #450 and #447 by implementing an optional
group_by_columns
argument across many of the core tests indbt-utils
. Specifically, I extended this check to allow of the relevant tests. Collectively:For example, to test for at least one valid value by group, the
group_by_columns
argument could be used as follows:Motivation
The motivation for this PR is outlined as greater length in this blog post. In short:
Implementation
In implementing this PR, I considered a few core principles:
With these principles in mind, the majority of implementations are like that of the
recency
macro (L7-11) where all relevant SQL strings are precomputed:The main deviations to this were the
sequential()
macro (requiring a window function) and theequal_rowcount()
/fewer_rows_than()
(requiring joins)Notes
dbt-utils
, I'm sending this as a fresh PR, but the prior issue may add additional context.Checklist
star()
source)limit_zero()
macro in place of the literal string:limit 0
dbt_utils.type_*
macros instead of explicit datatypes (e.g.dbt_utils.type_timestamp()
instead ofTIMESTAMP