Allocation of work to cluster #568
-
I've been diligently using future to do parallel simulations on a remote cluster. A simplified simulation looks like this library(parallelly)
library(future)
library(future.apply)
my_cluster <- parallelly::makeClusterPSOCK("ip1", "ip2")
plan(cluster, workers=my_cluster)
reps <- 100
sim_df <- data.frame(n=c(rep(100, reps), rep(1000, reps)), rep=rep(1:reps, 2))
sim_res <- future_apply(sim_df, MARGIN=1, future.seed=TRUE, FUN = function(x) {
n <- x["n"]
X <- rnorm(n)
p <- 2*pnorm(-abs(sqrt(n)*mean(X)))
})
sim_df$p <- sim_res This is just an example, running code as simple as this remotely would, of course, be silly. The key point is that I run simulations where I vary a parameter (here Recently, I watched the performance of each of the workers while one of my simulations ran and noticed that the worker on It looks to me like something is "preallocating" half the jobs to each of the two workers and when the first worker finishes half of the jobs, only the second work continues, even if there are many more jobs to do. I find this highly undesirable and was wondering whether there is a setting either in |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Argument |
Beta Was this translation helpful? Give feedback.
Argument
future.scheduling
, and its duality argumentfuture.chunk.size
, provides ways to control how tasks are split up across workers. They in turn support additional low-level control via attributeorder
.