Why not default to "globals = FALSE" if sequential or multicore plans? #627
-
Thanks a lot for all of your work and community support, @HenrikBengtsson ! Collecting globals understandably takes up non-trivial time in some cases. I'm guessing my question is naive since I don't have a good understanding of the internals of 'future', forking, etc., but would it make sense to default to the behavior of "globals = FALSE" if the plan is sequential or multicore? Specifically, the default argument of "globals" could be "NA/missing/NULL" and 'future' would set the behavior according to the plan. The user could always set the argument explicitly, as now, if they preferred. In case it is helpful to discuss in the context of an example, consider the following:
The above example succeeds for me with plans sequential and multicore; and it fails for me as expected with multisession. I'm most interested in I post this as a question, rather than an feature request, since I'm guessing it would not be a good idea, but I'm hoping to learn more about globals from your answer of why not :) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
It's mainly due to conservative design decisions + leaving room for future changes; Having the same defaults for Having There's also the scenario where a future is created lazily, e.g. > library("future")
a <- 42
f <- future({ b <- 2; a * b }, globals = FALSE, lazy = TRUE)
> a <- 0
> value(f)
[1] 0 A future scenario (pun not intended), which is on the roadmap, is for futures to be generic to the very point where they're sent to a parallel backend. This means that the I might revisit these defaults later on when more things on the roadmap have been implemented. |
Beta Was this translation helpful? Give feedback.
-
That makes a lot of sense. Thanks a lot for your time to write that detailed explanation! |
Beta Was this translation helpful? Give feedback.
It's mainly due to conservative design decisions + leaving room for future changes;
Having the same defaults for
future()
regardless of parallel package helps detect developer mistakes early on. For instance, if someone writes code usingplan(sequential)
andplan(multicore)
and fail to consider some globals, then they will go unnoticed until an end-user triesplan(multisession)
.Having
globals
default to TRUE on some backends and FALSE on others might also add confusion. It's always easier to grasp how things work, if they work the same in most cases.There's also the scenario where a future is created lazily, e.g.