-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add verb uncount()
#558
add verb uncount()
#558
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your summary of the approach in the PR body was super helpful; but I think it also needs to be included in the code as comments so that when looking at the code in the future it's easy to see the basic approach.
R/verb-uncount.R
Outdated
@@ -0,0 +1,73 @@ | |||
#' "Uncount" a database table | |||
#' | |||
#' This is a method for the tidyr `uncount()` generic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to include a comment that it will uses a temporary table
… mgirlich-add-uncount
The SQL snapshots don't work nicely together with the different backends. Depending on the active backend more or less temporary tables names are created, i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just let me know how you want to merge these. If we wait and merge them all after they're all complete, I'm happy to handle the merge conflicts in the NEWS file on my end.
I think prepared the other PRs to be ready to be merged. I guess it is easier and faster if you handle the merge conflicts on your side. Thanks! |
Conflicts: DESCRIPTION
As
uncount()
currently isn't a generic (also see tidyverse/tidyr#1071) I named the function withdbplyr_uncount()
.I see two strategies for translating:
The recursive CTE would require much more code and also more testing across backends. Therefore, I simply chose the unequal join. I'm not sure how big the performance difference is.
There are also two variants to create the temporary table
generate_series()
function. Unfortunately, this is only available as an SQLite extension which is not included in RSQLite.The functionality is basically the same as the
tidyr
version with the small difference that theweights
are not checked as strictly (i.e. no error if some weights are not non-negative integers).Overview of the strategy:
n_max
n_max
data$weight <= tmp$id
The generated SQL looks like this
One could simplify the SQL a little