Allocation of work to cluster #568

ARLundborg · 2021-12-14T16:12:18Z

ARLundborg
Dec 14, 2021

I've been diligently using future to do parallel simulations on a remote cluster. A simplified simulation looks like this

library(parallelly)
library(future)
library(future.apply)

my_cluster <- parallelly::makeClusterPSOCK("ip1", "ip2")
plan(cluster, workers=my_cluster)

reps <- 100
sim_df <- data.frame(n=c(rep(100, reps), rep(1000, reps)), rep=rep(1:reps, 2))

sim_res <- future_apply(sim_df, MARGIN=1, future.seed=TRUE, FUN = function(x) {
  n <- x["n"]
  X <- rnorm(n)
  p <- 2*pnorm(-abs(sqrt(n)*mean(X)))
})
sim_df$p <- sim_res

This is just an example, running code as simple as this remotely would, of course, be silly. The key point is that I run simulations where I vary a parameter (here n) that drastically affects run time of each step in the future_apply-function.

Recently, I watched the performance of each of the workers while one of my simulations ran and noticed that the worker on "ip1" started iddling when the remaining number of simulations was low. The loop finished just fine but the progress slowed down significantly as only one worker rather than 2 was running.

It looks to me like something is "preallocating" half the jobs to each of the two workers and when the first worker finishes half of the jobs, only the second work continues, even if there are many more jobs to do. I find this highly undesirable and was wondering whether there is a setting either in plan or future_apply that prevented this from happening?

Answered by HenrikBengtsson

Dec 14, 2021

Argument future.scheduling, and its duality argument future.chunk.size, provides ways to control how tasks are split up across workers. They in turn support additional low-level control via attribute order.

View full answer

HenrikBengtsson · 2021-12-14T19:33:55Z

HenrikBengtsson
Dec 14, 2021
Maintainer

Argument future.scheduling, and its duality argument future.chunk.size, provides ways to control how tasks are split up across workers. They in turn support additional low-level control via attribute order.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocation of work to cluster #568

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Allocation of work to cluster #568

ARLundborg Dec 14, 2021

Replies: 1 comment

HenrikBengtsson Dec 14, 2021 Maintainer

ARLundborg
Dec 14, 2021

HenrikBengtsson
Dec 14, 2021
Maintainer