-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation Issues with New "deduplicate" macro #542
Comments
Do you have insight on how this deduplicate function is intended to be used and what each of the parameters are for? Here's my guesses, but would be great to hear your thoughts (and suggestions for word-smithing): Args:
|
Part of what confused me about this is the name "group_by" for the second parameter. That implies aggregation to me, but this is not doing the logical operation of aggregation (with the exception of the BQ-specific version of the macro which has to do an involved workaround due to BQ issues with window functions not being able to successfully process large volumes of data.) This is really doing a window function. It would properly be more helpful to call the second parameter something like "key_columns" or "deduplication_columns." Also, purely from reading the code, I think instead of saying Also, I'm confused how |
That's all correct with the exception of |
@codigo-ergo-sum Oh, I missed that in the docs! It is optional for all the other DBs but maybe you're right and it shouldn't be optional. |
I've just had a look at this and I think we can do away with the confusing |
I'm not sure that including Similarly, Suggesting that these are just column names, I think, would introduce additional confusion. Perhaps we can just change |
This can be closed now that #548 has been merged, I believe. |
Thank you for calling this out @judahrand ! Added "Resolves #542" as a comment into #548 for traceability and manually closing this issue. |
Describe the bug
The new "deduplicate" macro has a broken link at the top of the page to the documentation section on the front page of the repo. First link is https://github.com/dbt-labs/dbt-utils#deduplicate but then the second link is https://github.com/dbt-labs/dbt-utils#deduplicate-source. Could this be fixed?
Also, the macro itself is a bit confusing. I understand deduplication in general and I'm having a bit of a hard time understanding how this function is intended to be used and what some of the parameters are for. Possible to beef this up?
Steps to reproduce
Same as above.
Expected results
Links work on the documentation page for the repo and the macro is clearly understandable along with all parameters to be used with it, whether required or optional.
Actual results
N/A
Screenshots and log output
N/A
System information
N/A
The output of
dbt --version
:N/A
Are you interested in contributing the fix?
Potentially
The text was updated successfully, but these errors were encountered: