-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNG: Protect against misuse, i.e. using random numbers without setting up proper parallel RNG #353
Comments
This is now implemented in the develop branch. For example, one, proper, way to use RNG is futures is to use: library(future)
options(future.rng.onMisuse = "warning")
f <- future(rnorm(1), seed=TRUE)
value(f)
## [1] 1.685746 If we had used f <- future(rnorm(1), seed=FALSE) we'd get: value(f)
## Warning message:
## UNRELIABLE VALUE: Detected that random numbers were generated while future
## ('<none>') was resolved. Because future argument 'seed' was set to 'FALSE',
## those random numbers may not be statistical sound. To fix this, specify
## argument '[future.]seed', e.g. 'seed=TRUE'. To disable this check, set
## option 'future.rng.onMisuse' to "ignore".
## [1] 0.2068498 The current default is: [1] -0.4169564
f <- future(rnorm(1))
value(f)
## [1] 1.365723 which, for backward-compatible reasons, is equivalent too: f <- future(rnorm(1), seed=NULL)
value(f)
## [1] 1.685746 Using |
Further updates on detecting misuse of parallel RNG; with remotes::install_github("HenrikBengtsson/future@develop") and setting options(future.rng.onMisuse = "warning") ## or "error" or
in, say, For example, with future.apply: > y <- future.apply::future_lapply(1:2, FUN = function(x) { rnorm(1) })
Warning message:
UNRELIABLE VALUE: Detected that random numbers were generated while future ('future_lapply-1')
was resolved. Because future argument 'seed' was set to 'FALSE', those random numbers
may not be statistical sound. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'.
To disable this check, set option 'future.rng.onMisuse' to "ignore".
> To avoid this, we must explicitly request to use parallel RNG; y <- future.apply::future_lapply(1:2, FUN = function(x) { rnorm(1) }, future.seed = TRUE)
> Similar for foreach with doFuture; > library(foreach)
> doFuture::registerDoFuture()
> y <- foreach(x = 1:2) %dopar% { rnorm(1); x }
Warning message:
UNRELIABLE VALUE: Detected that random numbers were generated while future ('<none>')
was resolved. Because future argument 'seed' was set to 'FALSE', those random numbers
may not be statistical sound. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'.
To disable this check, set option 'future.rng.onMisuse' to "ignore".
> The correct way here is (for now*) to use doRNG: > library(foreach)
> doFuture::registerDoFuture()
> y <- foreach(x = 1:2) %dopar% { rnorm(1); x }
> doRNG::registerDoRNG()
> library(doRNG) ## https://github.com/renozao/doRNG/issues/13
> y <- foreach(x = 1:2) %dopar% { rnorm(1); x }
Warning message:
In list(args = (1:2)(), argnames = "x", evalenv = <environment>, :
Foreach loop had changed the current RNG type: RNG was restored to same type, next state
> Comment: That warning comes from doRNG and is tracked in futureverse/doFuture#42 (*) I might add built-in parallel RNG support to doFuture in next-next-release. For furrr, we have: > y <- furrr::future_map(1:2, function(x) { rnorm(1); x })
Warning message:
UNRELIABLE VALUE: Detected that random numbers were generated while future ('<none>')
was resolved. Because future argument 'seed' was set to 'FALSE', those random numbers
may not be statistical sound. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'.
To disable this check, set option 'future.rng.onMisuse' to "ignore".
> which the correction being: > y <- furrr::future_map(1:2, function(x) { rnorm(1); x }, .options = furrr::future_options(seed = TRUE))
> Obviously, if RNG is not used, there will be no warning, e.g. > y <- future.apply::future_lapply(1:2, FUN = function(x) { x })
> Roadmap
I don't know when it's safe to set cc/ h/t @pat-s |
This is a huge step forward :) RoadmapI like it a lot, especially setting Warning message content
I am wondering if this is descriptive enough to the average user to take the right action/understand what#s going on. Maybe a little bit of extra context would help here to simplify taking action? Warning message: Maybe this warning could even include a link to a FAQ'isch like page which again links to more information about the whole topic. I know you do not like the term "reproducible" in this context but my naming for the option would be something like
Thanks again for the good work! |
Thanks for the suggestions - yeah, that message is a bit clunky/technical. What about:
I'm avoiding mentioning I like the option name |
Oh yeah, that might trigger confusion whether
Agree. It's up to you, both have their pros and cons :) |
…ved to future orchestration errors [#353]
FYI, use of RNG by mistake will produce a warning by default in the next release of {future}, cf. commit df11e64 |
Proposal
Have the future framework detect when random numbers are used/generated/produced although no parallel RNG streams are in place. Make this check optional via an option. For example,
The analogue for basic futures would be:
where
future.rng.onMisuse
: If random numbers are used in futures, then parallel (L'Ecuyer-CMRG) RNG should be used in order to get statistical sound RNGs. The defaults in the future framework assume that no random number generation (RNG) is taken place in the future expression because L'Ecuyer-CMRG RNGs come with an unnecessary overhead if not needed. To protect against mistakes, the future framework attempts to detect when random numbers are used despite L'Ecuyer-CMRG RNGs are not in place. If this is detected, andfuture.rng.onMisuse = "error"
, then an informative error message is produced. If"warning", then a warning message is produced. If
"ignore"`, no check is performed.BTW
The last future example above, reminds me that it would be convenient if we could use regular fixed RNG seed to generate L'Ecuyer-CMRG seeds, e.g.
as well as
Especially the latter would be useful since that is a common use pattern with future.apply.
See also
This idea was triggered by a Twitter discussion on 2019-11-11 (https://twitter.com/henrikbengtsson/status/1194007939725479937).
The text was updated successfully, but these errors were encountered: