add verb `uncount()` #558

mgirlich · 2020-12-07T10:44:51Z

As uncount() currently isn't a generic (also see tidyverse/tidyr#1071) I named the function with dbplyr_uncount().

I see two strategies for translating:

use a recursive common table expression (see this stackoverflow discussion)
or use an unequal join with a temporary table (the strategy I chose)

The recursive CTE would require much more code and also more testing across backends. Therefore, I simply chose the unequal join. I'm not sure how big the performance difference is.

There are also two variants to create the temporary table

use the generate_series() function. Unfortunately, this is only available as an SQLite extension which is not included in RSQLite.
get the maximum value of the weights column and create a local tibble to upload.

The functionality is basically the same as the tidyr version with the small difference that the weights are not checked as strictly (i.e. no error if some weights are not non-negative integers).

Overview of the strategy:

get the max weight, n_max
create a temporary db table with ids from 1 to n_max
duplicate data by inner joining with this temporary table under the condition that data$weight <= tmp$id

The generated SQL looks like this

SELECT `x`,
       `test`
FROM
  (SELECT `x`,
          `n`,
          `test`
   FROM `dbplyr_366` AS `LHS`
   INNER JOIN `dbplyr_367` AS `RHS` ON (`RHS`.`test` <= `LHS`.`n`))

One could simplify the SQL a little

hadley

Your summary of the approach in the PR body was super helpful; but I think it also needs to be included in the code as comments so that when looking at the code in the future it's easy to see the basic approach.

hadley · 2021-01-15T22:33:34Z

R/verb-uncount.R

@@ -0,0 +1,73 @@
+#' "Uncount" a database table
+#'
+#' This is a method for the tidyr `uncount()` generic.


I think this needs to include a comment that it will uses a temporary table

R/verb-uncount.R

tests/testthat/test-verb-uncount.R

… mgirlich-add-uncount

mgirlich · 2021-01-18T08:39:21Z

The SQL snapshots don't work nicely together with the different backends. Depending on the active backend more or less temporary tables names are created, i.e. dbplyr_table_name has a different value. This is an issue for this snapshot test. For now I simply set the option manually as a workaround but it's not exactly the nicest workaround.
One could also force the backend tests to run later by changing their file name but that's not very nice either to not follow the normal conventions. Do you have a better idea?

hadley

Looks good! Just let me know how you want to merge these. If we wait and merge them all after they're all complete, I'm happy to handle the merge conflicts in the NEWS file on my end.

mgirlich · 2021-01-19T07:43:10Z

I think prepared the other PRs to be ready to be merged. I guess it is easier and faster if you handle the merge conflicts on your side. Thanks!
By the way: great work with the new testthat. It is really nice and easy to work with the snapshots!

Conflicts: DESCRIPTION

mgirlich added 5 commits December 7, 2020 11:05

add verb uncount()

bc17803

refactor removal of columns

cdf0c86

do not generate extra column if not necessary

5e56294

import vctrs

8f4d9ba

make sure that weight is removed even when it is a grouping variable

89f052b

hadley reviewed Jan 15, 2021

View reviewed changes

mgirlich added 9 commits January 18, 2021 07:19

explicity copy temporary helper table to db

84d5829

add snapshot of sql

1dc5891

make test purpose a bit clearer

8137e4f

add info about temporary table

acb51ad

add overview of approach in code

9ca23ee

add workaround for sql snapshot

5c9a41d

add workaround for sql snapshot

c82aac2

Merge branch 'add-uncount' of https://github.com/mgirlich/dbplyr into…

1671b00

… mgirlich-add-uncount

delete accidentally added verb-uncount.new.md

42a0334

hadley approved these changes Jan 18, 2021

View reviewed changes

Merge commit 'd9ed52fad9b8f7d6c40c58b5059bbbb51321ab70'

a1d2639

Conflicts: DESCRIPTION

hadley merged commit f5dd4b2 into tidyverse:master Jan 19, 2021

hadley mentioned this pull request Jan 21, 2021

add uncount() #557

Closed

mgirlich deleted the add-uncount branch March 12, 2021 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add verb `uncount()` #558

add verb `uncount()` #558

mgirlich commented Dec 7, 2020

hadley left a comment

hadley Jan 15, 2021

mgirlich commented Jan 18, 2021

hadley left a comment

mgirlich commented Jan 19, 2021

add verb uncount() #558

add verb uncount() #558

Conversation

mgirlich commented Dec 7, 2020

hadley left a comment

Choose a reason for hiding this comment

hadley Jan 15, 2021

Choose a reason for hiding this comment

mgirlich commented Jan 18, 2021

hadley left a comment

Choose a reason for hiding this comment

mgirlich commented Jan 19, 2021

add verb `uncount()` #558

add verb `uncount()` #558