-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing Cartesian Join #31
Conversation
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @callum-mcdata |
Just signed the CLA! |
Thanks for opening this PR, @callum-mcdata! The big constraint here is making sure that we can calculate secondary metrics (eg. QTD aggs, PoP changes, rolling sums, etc). The nice thing about cartesian joining everything is that we can then easily implement these secondary calculations on warehouses using window functions! I've always had it in my head that not doing a cartesian join would make that harder/impossible.... but is that true? As long as we join to a date spine, i don't think we should have any problems with generating window functions for the currently supported set of secondary calculations. Is this something you've thought about? Is it right to assume that we can still implement these secondary calcs w/o doing a full set of cartesian joins? If so, then I think I'm all for this PR :) |
I wouldn't see why this change would block secondary_calculations but that is a good callout on something that I should test. Part of this is just my confusing terminology though 😵💫 . We would still be doing the cartesian join with the date spine to create the spined values that can be aggregated across time but we'd be removing the cartesian behavior in Source Table
Old Behavior
Etc, etc. New Behavior
|
ok - right on - feel free to assign me when this is ready for a review! |
@drewbanin I'm working on getting my local setup to start running some data tests beyond Joel's integration tests (ie confirm the behavior above is what I see in a dataset) but if you wanna review in the meantime, go for it! |
Okay I've run some eye-checks in Snowflake and confirmed that this behavior is working as intended! Given that it is a removal of functionality, I don't think adding integration tests to confirm this behavior is necessary - the current integration test behavior works just fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ship it!
Removing Cartesian Join
Issue: #9
What Does This PR Do?
This PR changes the
get_metric_sql
macro to remove the cartesian join in thespine__values
andspine
CTE's.spine__values
now selects distinct combinations of all provided dimensions and then creates thespine
CTE with that list of combinations.Should This PR Be Merged?
It really depends on how strongly we feel about the cartesian join functionality. I'm unsure of what use case there would be for impossible combinations of dimensions but @joellabes has thought about this more than I have and has stated that there is a use case.
As such, I'm opening this PR and will keep it open until we can come to a community/internal determination of whether we want to support this use case! The main difference between this and my last PR is that I removed the parameter within the metrics macro. My personal opinion is that we either remove that functionality or keep it within the package - too many parameters makes writing the macro
not fun
.