Set future to use parallel safe random number approach by default #647
Replies: 3 comments 2 replies
-
I do not believe there is an option (see, e.g., future options). As for why there's no option, although using an option for this might be convenient it could lead to code that can't be reproduced by others that don't set the option. In general, it's good practice to reduce global dependencies, especially ones that affect output. An alternative that might work for you is to define a new function. The new function could be a wrapper, or it could be a copy of the function that changes the default argument. For example, suppose
|
Beta Was this translation helpful? Give feedback.
-
That's a fair point. Although it can be set in the script, users might accidentally set it in, e.g., This of course is my personal point of view. And there are ways in standard R to affect output through global settings, e.g., by setting the seed or by setting |
Beta Was this translation helpful? Give feedback.
-
Hi. @scottkosty has a good point about reproducibility issues if we introduce a global option for controlling the default value of run_it <- function(n, future.seed = getOption("future.seed", FALSE)) {
future.apply::future_lapply(1:n, FUN = sqrt, future.seed = future.seed)
} Then consider package MyPkg, that calls my_fcn <- function(n) {
y <- run_it(n)
future.apply::future_lapply(y, FUN = function(x) {
rnorm(1) + x
}, future.seed = TRUE)
} Now, the current, default behavior of > set.seed(1); str(my_fcn(3))
List of 3
$ : num 2.38
$ : num -0.323
$ : num 1.6 and again, > set.seed(1); str(my_fcn(3))
List of 3
$ : num 2.38
$ : num -0.323
$ : num 1.6 However, if we change the default behavior > options(future.seed = TRUE) we get other, numerically reproducible values: > set.seed(1); str(my_fcn(3))
List of 3
$ : num -1.3
$ : num 1.15
$ : num 3.6 and again, > set.seed(1); str(my_fcn(3))
List of 3
$ : num -1.3
$ : num 1.15
$ : num 3.6 So, the problem is that with a global variable, you're not only changing the behavior of the code you control in your script or in your package, but you're also changing the behavior in other packages that is out of your control. So, we want to stay away from any global settings that affect the results elsewhere in R. FWIW, a famous example from the past was The remaining discussion would be whether (*) One could even argue that it should be an error rather than a warning, because in some statistical analyses it's a severe problem to not use proper RNGs. |
Beta Was this translation helpful? Give feedback.
-
Is there an option to set future to always use the parallel safe rng approach, rather than having to set seed=TRUE for every future call? I searched and read a lot but somehow can't find this. I tried
Beta Was this translation helpful? Give feedback.
All reactions