rep_sample_n() fix (#279) #325

simonpcouch · 2020-07-25T00:02:58Z

Alright, here’s a go at a fix for 279! This approach borrows heavily from @unoe’s implementation, and relies on the slice_sample() implementation from dplyr for most argument checking and errors.

All previous unit tests pass, and I've extended the testing a bit so that we'd hopefully catch an issue like 279 earlier on if one came up again. There's also some documentation clarification. From what I can tell, rep_sample_n() never checked that the supplied probs summed to one, so I changed the docs to refer to prob as sampling weights.

This also adds a rep_slice_sample() function, as @mine-cetinkaya-rundel recommended, that is a light wrapper around rep_sample_n() that more closely resembles the interface to dplyr::slice_sample(). I only changed argument names and ordering here.🦋

library(infer)
  
df <- tibble::tibble(
  ball_id = 1:5,
  color = factor(c(rep("a", 3), rep("b", 2)))
)

set.seed(1)
rep_sample_n(df, size = 2, reps = 3)
#> # A tibble: 6 x 3
#> # Groups:   replicate [3]
#>   replicate ball_id color
#>       <int>   <int> <fct>
#> 1         1       1 a    
#> 2         1       4 b    
#> 3         2       1 a    
#> 4         2       2 a    
#> 5         3       5 b    
#> 6         3       3 a

set.seed(1)
rep_sample_n(df, size = 2, reps = 3, prob = c(.9, rep(.025, 4)))
#> # A tibble: 6 x 3
#> # Groups:   replicate [3]
#>   replicate ball_id color
#>       <int>   <int> <fct>
#> 1         1       1 a    
#> 2         1       4 b    
#> 3         2       1 a    
#> 4         2       2 a    
#> 5         3       1 a    
#> 6         3       2 a

set.seed(1)
rep_sample_n(df, size = 2, reps = 3, prob = c(.9, rep(.025, 4)))
#> # A tibble: 6 x 3
#> # Groups:   replicate [3]
#>   replicate ball_id color
#>       <int>   <int> <fct>
#> 1         1       1 a    
#> 2         1       4 b    
#> 3         2       1 a    
#> 4         2       2 a    
#> 5         3       1 a    
#> 6         3       2 a

set.seed(1)
x <- rep_sample_n(tbl = df, size = 2, reps = 3)

set.seed(1)
x2 <- rep_slice_sample(.data = df, n = 2, reps = 3)

all.equal(x, x2)
#> [1] TRUE

^{Created on 2020-07-24 by the reprex package (v0.3.0.9001)}

It also looks like codecov is picking up on not having any testing for the noLD check (not from this PR)—not sure what unit testing for that feature would look like.

also adds a new rep_slice_sample wrapper that has a more similar interface to dplyr::slice_sample(). still need to extend unit testing and rewrite examples.

codecov-commenter · 2020-07-25T00:13:13Z

Codecov Report

Merging #325 into develop will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #325      +/-   ##
===========================================
- Coverage    99.91%   99.91%   -0.01%     
===========================================
  Files           15       15              
  Lines         1227     1222       -5     
===========================================
- Hits          1226     1221       -5     
  Misses           1        1

Impacted Files	Coverage Δ
R/rep_sample_n.R	`100.00% <100.00%> (ø)`

simonpcouch · 2020-07-25T00:24:41Z

By the way, the setup-R step on macOS-latest (devel) has been failing ~universally on GHA the past 3-4 days—something to do with availability of the daily builds.

R/rep_sample_n.R

ismayc · 2020-07-25T00:38:39Z

With regard to getting this back to 100% coverage, don't you just need to set capabilities("long.double") to get this to check? Or is that what you are having trouble setting?

document argument aliases in the same param rather than duplicating descriptions

echasnovski

New implementation uses the most recent features of {dplyr} 1.0.0: slice_sample() and .before in mutate(). This is good in every day usage, but not necessarily so in package development, as {infer} will not work with version of {dplyr} less than 1.0.0. I believe this is not so big of a problem, as most of {infer} users will have latest {dplyr} version, but it still seems to be a good idea to not rely on very recent {dplyr} features. Nevertheless, thank you @unoe for your help! It is much appreciated.

It seems that removing these lines, as per @rudeboybert's original doubts, should have been enough. But still the work of redocumenting and adding tests is very useful. Thanks, @simonpcouch!

I'll add some changes and request re-review.

Update: @simonpcouch, could you, please, take a look at my changes?

unoe · 2020-07-25T15:31:51Z

I completely understand your points and agree, @echasnovski, and thanks for all your work and help @simonpcouch.
At the beginning I considered simply using sample_n() rather than slice_sample(), which would remove the dependency on the newest {dplyr} version and make the implementation accessible to users of many {dplyr} versions.
However, this would have meant that rep_sample_n() would have needed to be updated again at some point since sample_n() is being retired in {dplyr}.

… make weighting more prominent

simonpcouch · 2020-07-25T17:20:41Z

Looks great! Thanks for the changes, @echasnovski.

Since @ismayc has given the thumbs up on both this PR and #326, I'm going to go ahead and merge #326 here and then this PR to develop. :-)

github-actions · 2021-03-08T00:17:08Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch added 5 commits July 24, 2020 14:17

possible rep_sample_n() re-implementation (#279)

8f9d7b7

also adds a new rep_slice_sample wrapper that has a more similar interface to dplyr::slice_sample(). still need to extend unit testing and rewrite examples.

add unit testing, adjust docs

5882a2b

fix test failure

96048bf

rename/order arguments re: dplyr::slice_sample()

bd7e72f

update NEWS

149bb13

simonpcouch requested review from echasnovski and ismayc July 25, 2020 00:24

ismayc reviewed Jul 25, 2020

View reviewed changes

R/rep_sample_n.R Outdated Show resolved Hide resolved

rename resamples -> slices, adjust docs

ede7bcc

document argument aliases in the same param rather than duplicating descriptions

ismayc approved these changes Jul 25, 2020

View reviewed changes

Lint 'rep_sample_n.R'.

6856cc4

echasnovski reviewed Jul 25, 2020

View reviewed changes

echasnovski added 3 commits July 25, 2020 13:48

Update/revert rep_sample_n() implementation.

7a13f51

Update rep_sample_n() tests.

7b376b0

Tweak rep_sample_n() documentation.

8519b79

echasnovski requested a review from ismayc July 25, 2020 11:04

rudeboybert added 2 commits July 25, 2020 12:09

Added weighted sampling example to rep_sample_n()

58457a2

Added tibble pkg to ex

1863946

rudeboybert mentioned this pull request Jul 25, 2020

Added weighted sampling example to rep_sample_n() #326

Merged

Lowered (w/o replacement) sample size & increased replicate number to…

6e6c512

… make weighting more prominent

ismayc approved these changes Jul 25, 2020

View reviewed changes

merge pr #326: added weighted sampling example

86150ad

simonpcouch merged commit acdc676 into develop Jul 25, 2020

simonpcouch deleted the rep-sample-n branch July 25, 2020 17:25

mine-cetinkaya-rundel mentioned this pull request Aug 18, 2020

rep_sample_n naming #337

Closed

rudeboybert mentioned this pull request Jan 7, 2021

Create rep_slice_sample(n) moderndive/moderndive#98

Closed

echasnovski mentioned this pull request Feb 16, 2021

Refactor rep_sample_n() and rep_slice_sample() #363

Closed

github-actions bot locked and limited conversation to collaborators Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rep_sample_n() fix (#279) #325

rep_sample_n() fix (#279) #325

simonpcouch commented Jul 25, 2020 •

edited by ismayc

Loading

codecov-commenter commented Jul 25, 2020 •

edited

Loading

simonpcouch commented Jul 25, 2020

ismayc commented Jul 25, 2020

echasnovski left a comment •

edited

Loading

unoe commented Jul 25, 2020

simonpcouch commented Jul 25, 2020

github-actions bot commented Mar 8, 2021

rep_sample_n() fix (#279) #325

rep_sample_n() fix (#279) #325

Conversation

simonpcouch commented Jul 25, 2020 • edited by ismayc Loading

codecov-commenter commented Jul 25, 2020 • edited Loading

Codecov Report

simonpcouch commented Jul 25, 2020

ismayc commented Jul 25, 2020

echasnovski left a comment • edited Loading

Choose a reason for hiding this comment

unoe commented Jul 25, 2020

simonpcouch commented Jul 25, 2020

github-actions bot commented Mar 8, 2021

simonpcouch commented Jul 25, 2020 •

edited by ismayc

Loading

codecov-commenter commented Jul 25, 2020 •

edited

Loading

echasnovski left a comment •

edited

Loading