Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor deduplicate() arguments #548

Merged
merged 14 commits into from
May 16, 2022
Merged

Conversation

judahrand
Copy link
Contributor

@judahrand judahrand commented Apr 14, 2022

This is a:

  • bug fix PR with no breaking changes — please ensure the base branch is main
  • new functionality — please ensure the base branch is the latest dev/ branch
  • a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

This Pull Request aims to improve the deduplicate macro based on community feedback from its first release. It is stacked on top of #549 and introduces breaking changes.

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt_utils.type_* macros instead of explicit datatypes (e.g. dbt_utils.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

Resolves #542. Resolves #543.

@judahrand judahrand force-pushed the feature/dedupe branch 4 times, most recently from 8f01c53 to a8f8513 Compare April 14, 2022 08:43
@judahrand judahrand changed the base branch from main to next/patch April 14, 2022 09:01
@judahrand judahrand force-pushed the feature/dedupe branch 2 times, most recently from 593ac80 to 3c27a18 Compare April 14, 2022 09:02
@judahrand
Copy link
Contributor Author

I'm not sure why the Redshift test is failing with

09:05:22  Database Error in model test_deduplicate (models/sql/test_deduplicate.sql)
09:05:22    SELECT DISTINCT ON is not supported
09:05:22    compiled SQL at target/run/dbt_utils_integration_tests/models/sql/test_deduplicate.sql
09:05:22  Encountered an error:
FailFast Error in model test_deduplicate (models/sql/test_deduplicate.sql)
  Failing early due to test failure or runtime error

given that DISTINCT ON only appears in the Postgres implementation...

@judahrand
Copy link
Contributor Author

Ah, didn't realize Redshift defaults to Postgres rather than default.

@judahrand judahrand force-pushed the feature/dedupe branch 8 times, most recently from 7b1e0c2 to a60dac6 Compare April 14, 2022 11:26
@judahrand judahrand changed the title Optimize the deduplicate macro Refactor deduplicate() arguments Apr 14, 2022
@judahrand judahrand force-pushed the feature/dedupe branch 4 times, most recently from c4c5430 to 0641bfe Compare April 14, 2022 11:49
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judahrand love the direction this is heading! Thank you for your detailed coverage of a wide range of databases, especially as it relates to using natural joins.

The updates are definitely more readable.

I left a few comments regarding the documentation -- appeared that the renaming to partition_by was overlooked in the usage examples.

I'm wondering if we can do any fancy footwork in the background so that the changes aren't breaking (but still make all the updates we want). I think I've seen some examples in past merge requests, so I'm going to try looking for those, and I'll follow-up in this conversation if/when I find them.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@judahrand
Copy link
Contributor Author

judahrand commented Apr 14, 2022

@dbeatty10 I'll deal with these comments now 😄 I also wasn't sure if you thought this would have to wait for a minor release due to breaking changes even though the macro is very very new? But I suspect a breaking change is a breaking change. Maybe it would be worth in future marking things as 'alpha' for their first few releases to hash out this kind of feedback in a way which is easier to fix?

I'm wondering if we can do any fancy footwork in the background so that the changes aren't breaking (but still make all the updates we want). I think I've seen some examples in past merge requests, so I'm going to try looking for those, and I'll follow-up in this conversation if/when I find them.

Oh, I shall look forward to seeing the fancy footwork!

README.md Outdated Show resolved Hide resolved
@judahrand
Copy link
Contributor Author

I'm wondering if we can do any fancy footwork in the background so that the changes aren't breaking (but still make all the updates we want). I think I've seen some examples in past merge requests, so I'm going to try looking for those, and I'll follow-up in this conversation if/when I find them.

@dbeatty10 Any luck here?

@judahrand judahrand force-pushed the feature/dedupe branch 2 times, most recently from 8f806b9 to bae387a Compare May 16, 2022 17:10
@judahrand judahrand changed the base branch from next/patch to main May 16, 2022 17:11
@judahrand
Copy link
Contributor Author

I think there's two main things we'll need to do to merge this in once all tests are passing:

  1. Make sure there's no breaking changes (if possible)
  2. Be able to merge to main instead of next/patch

Hey @dbeatty10,

I think that I've managed to overcome both of these barriers thanks to your pointers over Slack.

Are you able to re-review this?

Once this is merged it would also be great to try to get #550 and #551 in too.

@judahrand judahrand requested a review from dbeatty10 May 16, 2022 17:21
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @judahrand ! Thanks for your documentation of how to handle the deprecated signature and for adding an integration test for that case.

I left two comments on very minor things to update within the documentation.

Looks good to me to merge after that.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@judahrand
Copy link
Contributor Author

@dbeatty10 Sorted!

@dbeatty10 dbeatty10 self-requested a review May 16, 2022 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Deduplicate Macro to use QUALIFY Documentation Issues with New "deduplicate" macro
3 participants