Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing Cartesian Join #31

Merged
merged 3 commits into from
Jun 7, 2022
Merged

Removing Cartesian Join #31

merged 3 commits into from
Jun 7, 2022

Conversation

callum-mcdata
Copy link
Contributor

Removing Cartesian Join

Issue: #9

What Does This PR Do?

This PR changes the get_metric_sql macro to remove the cartesian join in the spine__values and spine CTE's.

  • Current Behavior: Combinations of data that do not exist within the parent dataset are reflected in the dataset produced by the macro. See Issue Cartesian Joins Create ALL Combinations, Not All Possible Combinations #9 for example.
  • New Behavior: spine__values now selects distinct combinations of all provided dimensions and then creates the spine CTE with that list of combinations.

Should This PR Be Merged?

It really depends on how strongly we feel about the cartesian join functionality. I'm unsure of what use case there would be for impossible combinations of dimensions but @joellabes has thought about this more than I have and has stated that there is a use case.

As such, I'm opening this PR and will keep it open until we can come to a community/internal determination of whether we want to support this use case! The main difference between this and my last PR is that I removed the parameter within the metrics macro. My personal opinion is that we either remove that functionality or keep it within the package - too many parameters makes writing the macro not fun.

@cla-bot
Copy link

cla-bot bot commented May 20, 2022

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @callum-mcdata

@callum-mcdata
Copy link
Contributor Author

Just signed the CLA!

@drewbanin
Copy link
Contributor

Thanks for opening this PR, @callum-mcdata!

The big constraint here is making sure that we can calculate secondary metrics (eg. QTD aggs, PoP changes, rolling sums, etc). The nice thing about cartesian joining everything is that we can then easily implement these secondary calculations on warehouses using window functions! I've always had it in my head that not doing a cartesian join would make that harder/impossible.... but is that true? As long as we join to a date spine, i don't think we should have any problems with generating window functions for the currently supported set of secondary calculations.

Is this something you've thought about? Is it right to assume that we can still implement these secondary calcs w/o doing a full set of cartesian joins? If so, then I think I'm all for this PR :)

@callum-mcdata
Copy link
Contributor Author

I wouldn't see why this change would block secondary_calculations but that is a good callout on something that I should test.

Part of this is just my confusing terminology though 😵‍💫 . We would still be doing the cartesian join with the date spine to create the spined values that can be aggregated across time but we'd be removing the cartesian behavior in spine__values that creates those impossible combinations of values. Here's how it should work:

Source Table

Person City Sales Date
Callum Chicago 1 1/1/21
Drew Philly 2 1/2/21
Joel Somewhere in NZ 3 1/3/21

Old Behavior

Date Person City Metric
1/1/21 Callum Chicago 1
1/1/21 Callum Philly 0
1/1/21 Callum Somewhere in NZ 0
1/2/21 Callum Chicago 0
1/2/21 Callum Philly 0

Etc, etc.

New Behavior

Date Person City Metric
1/1/21 Callum Chicago 1
1/2/21 Callum Chicago 0

@cla-bot cla-bot bot added the cla:yes The CLA has been signed label May 20, 2022
@drewbanin
Copy link
Contributor

ok - right on - feel free to assign me when this is ready for a review!

@callum-mcdata
Copy link
Contributor Author

@drewbanin I'm working on getting my local setup to start running some data tests beyond Joel's integration tests (ie confirm the behavior above is what I see in a dataset) but if you wanna review in the meantime, go for it!

@callum-mcdata callum-mcdata requested a review from drewbanin May 31, 2022 14:26
@callum-mcdata callum-mcdata marked this pull request as draft June 1, 2022 21:58
@callum-mcdata callum-mcdata added the enhancement New feature or request label Jun 2, 2022
@callum-mcdata callum-mcdata marked this pull request as ready for review June 2, 2022 18:50
@callum-mcdata callum-mcdata requested a review from jasnonaz June 2, 2022 19:53
@callum-mcdata
Copy link
Contributor Author

callum-mcdata commented Jun 2, 2022

Okay I've run some eye-checks in Snowflake and confirmed that this behavior is working as intended! Given that it is a removal of functionality, I don't think adding integration tests to confirm this behavior is necessary - the current integration test behavior works just fine

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it!

@callum-mcdata callum-mcdata merged commit 818b0f3 into main Jun 7, 2022
@callum-mcdata callum-mcdata deleted the remove_cartesian_join branch June 7, 2022 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes The CLA has been signed enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cartesian Joins Create ALL Combinations, Not All Possible Combinations
2 participants