-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement schema tests by group/partition (WIP - not ready for review) #451
Conversation
@joellabes - I will leave this PR marked as a Draft for the foreseeable future until I've added the argument to all the in-scope tests, updated the docs, etc. In the meantime, I've pushed an example of the Currently I have added:
Some areas where I'd specifically like input (in addition to anything that comes to mind): UI
Tests
|
Hi @emilyriederer, super late for me to come back to you but I have just spent some time looking at this. I love it! Big fan 🎉 I thought this was going to be much fiddlier than it has turned out to be, which is always a win. Some answers to your questions:
Not written down anywhere, but in general it's good to have common arguments named the same way and in the same order where possible. As an example, the metrics package has some secondary calculations which have arguments in common - they're all named the same way and in the same order, optimising globally as opposed to picking the perfect word to get each one's local maximum.
I'm leaning towards something like
You've nailed the difference - seeds are nicer than hand-crafted select statements, where the data is static. But when you need the current timestamp, it has to be a proper model.
Good question! I'm generally OK with the same input file being used for multiple output files, if it can be achieved without needing to contort the tests or the models too much. E.g. I was just looking at PR #507 which has multiple tests built on top of the same seed file, by using the where conditions. But the window functions vs standard functions made more sense to be in different seed files. No hard and fast rule, but if you're working too hard to keep it in one file, or doing huge amounts of copy-paste, you're probably leaning too far in one direction or the other.
We don't currently have a way to test for detection of failures 😢 The Core team are looking into improving testability of adapters at the moment, which will hopefully bleed over to packages as well (cc @jtcohen6) And a question from me:
Thanks for your work on this so far 🤩 |
Also! A heads up that #521 will change a bunch of file names, so it's worth waiting for that to be merged before moving this any further forward |
Thanks @joellabes for all of the helpful comments! I'll wait until #521 is merged then work on your updates and expanding to other tests. 🤓 To answer your question, this PR shouldn't be limited to any one database since the functionality is bound to be pretty basic changes to insert |
I was meaning in relation to this line |
Ah yes, sorry! I see what you mean now. I'll clean that up before submitting. I initially set out to write a more complicated test (to have failure cases along with successes) and thought I might need to do it differently for Postgres so I'd started the if/else skeleton. However, as things stand, it's superfluous because the |
This is a:
master
dev/
branchdev/
branchDescription & motivation
This PR adds checks by groups as discussed in #450 . In short, the motivation is that some checks cannot be expressed at all without subgrouping, and other checks can be more rigorous at the group level.
Checklist
I will check off checklist items before asking for formal approval of this PR. This PR will consist of many small, similar pieces, and this currently contains only one such iteration.
star()
source)limit_zero()
macro in place of the literal string:limit 0
dbt_utils.type_*
macros instead of explicit datatypes (e.g.dbt_utils.type_timestamp()
instead ofTIMESTAMP