Calculating Bayesian models using `brms::brm()` repeatedly in a nested parallelisation using `furrr` #595

CLRafaelR · 2022-02-28T09:25:02Z

CLRafaelR
Feb 28, 2022

(If this question should be posted on StackOverflow than here, please feel free to let me know...:pray:)

I am trying to calculate the required number of participants in an experiment being planned, by Bayesian power analyses with data simulation suggested by Vasishth et al. (2021). I apply a nested parallelisation to do the calculation, but I have failed to complete the entire iteration without any error. I suspect that connection failures occurred during the loop, leaving some futures unresolved. However, I could not detect which workers unexpectedly malfunction and how I should stop them. Therefore, I want to know how I could detect and stop the unconnected workers.

On an instance of Amazon EC2 (m6i.32xlarge; 128 logical cores and a memory of 512 GiB), I have tried to run the simulation. Since I am trying to do a nested parallelisation, I set the following plan:

options(future.globals.maxSize = 5e10) #50GiB

plan(
  list(
    tweak(
      multisession,
      workers = 8,
      rscript_startup = quote(
        options(
          socketOptions="no-delay"
        )
      )
    ),
    tweak(
      multisession,
      workers = 4,
      rscript_startup = quote(
        options(
          socketOptions="no-delay"
        )
      )
    )
  )
)

I tried to set cluster using the commands below, but I could not manage to set clusters on the EC2 instance:

cl <- parallelly::makeClusterPSOCK(8L, autoStop = TRUE)

c_cl <- parallelly::makeClusterPSOCK(4L, autoStop = TRUE)

plan(
  list(
    tweak(cluster, workers = cl),
    tweak(cluster, workers = c_cl)
  )
)

What the outer layer (8 cores) and inner layer (4 cores) are doing is summarised in the figure below and the next sections:

Inner layer

create one simulation dataset using a user-defined function generate_sim_data_latency();
fit the following four models in parallel using future_pmap(..., brms::brm(..., chains = 4, cores = 4));
- an alternative model that contains f1, f2, and their interaction intrctn as explanatory variables: rt ~ 0 + Intercept + f1 + f2 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
- a null model that lacks f1: rt ~ 0 + Intercept + f2 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item),
- a null model that lacks f2: rt ~ 0 + Intercept + f1 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item),
- a null model that lacks intrctn: rt ~ 0 + Intercept + f1 + f2 + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
- Each model uses four cores, and these cores are set by brm(.., cores = 4, ...), not plan()
do bridge sampling for each of the four models in parallel using future_map(..., bridgesampling::bridge_sampler(...));
calculate the Bayes Factors comparing the alternative model against each of null models;
save the coefficients of the alternative model along with the Bayes Factors calculated above as an .Rds file. The output looks like as follows:

# A tibble: 4 x 9
  Coefficients     BF Estimate Est.Error    Q2.5  Q97.5 npart niter elapsed_time
  <chr>         <dbl>    <dbl>     <dbl>   <dbl>  <dbl> <dbl> <int> <Period>    
1 order         0.378 -0.00121    0.0220 -0.0465 0.0402    20     1 56M 45.9S   
2 voice         0.480  0.0111     0.0191 -0.0268 0.0481    20     1 56M 45.9S   
3 intrctn       0.408  0.0165     0.0157 -0.0146 0.0472    20     1 56M 45.9S   
4 Intercept    NA      7.65       0.0117  7.62   7.67      20     1 56M 45.9S

This inner processes takes approximately 50--80 minutes to complete one cycle. In the example above, it took almost 57 minutes before produce .Rds.

Outer layer

do the steps of the inner layer (1--5 above), 100 times per the number of participants in the simulattion dataset (c(540, 180, 60, 20)).

Question

By doing the above-mentioned iteration with nested parallelisation, I expected that I could get eight .Rds files, at least, after the outer layer finished one cycle. The reason is that the plan assigns eight cores to the outer layer. However, I always get only one .Rds even waiting after 12 hours. In fact, when the only .Rds file I got was written into my working directory (50--80 mins after from starting execution), the CPU usage suddenly decreases from 100% to 20--30%, and the number of threads running or in run queue dropped from 130 or so to 40. Both CPU usage and the number of threads running never increased after that .Rds file was written. No error message could not be found. Therefore, some processes (workers) lost connection, leaving some future unresolved (and leaving garbage uncleaned).

Then, how can I avoid such failures of connection or of future resolution? How should I detect and stop the unconnected workers? Is this kind of nesting itself impossible? Any suggestion and hints are appreciated!

Preparation for the simulation

Package load

necessary_packages <- c(
  "tidyverse",
  "rstan",
  "brms",
  "future",
  "furrr"
)

lapply(
  necessary_packages[!necessary_packages %in% (.packages())],
  library,
  character.only = TRUE
)

(.packages())

rstan::rstan_options(auto_write = TRUE)
options(
  mc.cores = availableCores(),
  #future.globals.onReference = "error",
  scipen = 999
)

f_elapsed_time <- function(start_time) {
  Sys.time() |>
    difftime(
      start_time,
      units = "secs"
    ) |>
    as.numeric() |>
    lubridate::seconds_to_period() |>
    round(2)
}

start_time <- Sys.time()
f_elapsed_time(start_time)

set.seed(1212)

Settings of the generator of simulation datasets

Implementation of main effects

beta_intercept_latency <- 7.555694

beta_f1_latency <- -0.03869938

beta_f2_latency <- -0.03092317

beta_intrctn_latency <- -0.009032881

# beta_intercept_latency;
# beta_f1_latency;
# beta_f2_latency;
# beta_intrctn_latency

Implementation of random effects

# SDs of participant random effect
participant_ranefsd_latency <- c(
  # Intercept
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  ),
  # f2
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  ),
  # f1
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  ),
  # intrctn
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  )
)

# SDs of item random effect
item_ranefsd_latency <- c(
  # Intercept
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  ),
  # f2
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  ),
  # f1
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  ),
  # intrctn
  extraDistr::rtnorm(
    n    = 1,
    mean = 0,
    sd   = 0.05,
    # To ensure the returned value is always positive,
    # since the returned value is a SD
    a    = 0
  )
)

# some intermediate values were chosen for correlations:
corr_matrix_latency <- rethinking::rlkjcorr(
  # Number of random matrices to sample
  n = 1,
  # Dimension of correlation matrix
  K = 4,
  # Parameter controlling shape of distribution
  eta = 2
)

Definition of the simulation data generator

Data generating function

generate_sim_data_latency <- function(
    n_item        = 8,
    n_participant = NULL,
    beta          = NULL,
    # participant vcov 4x4 matrix ------------------------------
    sigma_u       = NULL,
    # item vcov 4x4 matrix ------------------------------
    sigma_w       = NULL,
    sigma_e       = NULL,
    verbose       = FALSE,
    seed          = NULL
) {
  # Set seed
  if (is.null(seed) == FALSE) {
    set.seed(seed)
  }

  # Data frame generation
  base <- tibble(
    # Add a column for participant ------------------------------
    participant = rep(
      1:n_participant,
      each = n_item * 2 * 4
    ) |>
        as.factor(),

    # Add a column for item ------------------------------
    #
    # 8 items
    # 2 mirror images (in different colours) for a condition per item
    # e.g. for os-u 'to push'
    # [L] Red agent     and [R] Blue patient
    # [L] White patient and [R] Black agent
    # 4 condition
    # 8 * 2 * 4 = 64 stimuli in total per participant
    item = rep(
      1:n_item,
      # 2 mirror images for a condition * 4 condition * (number of participants)
      each = 2 * 4,
      times = n_participant
      ) |>
        as.factor(),

    # Add a column for the condition ------------------------------
    condition = rep(
      c(
        "f1's Lv1 and f2's Lv1",
        "f1's Lv1 and f2's Lv2",
        "f1's Lv2 and f2's Lv1",
        "f1's Lv2 and f2's Lv2"
      ),
      each = 2,
      times = 8 * n_participant
    ),

    # Add columns for the factors ------------------------------
    # The values are integer to run code faster and to save memory
    # https://stackoverflow.com/a/7014671/10215301

    # Add a column for f1 (First main effect) ------------------------------
    # sum contrast coding for f1
    # f1's Lv1:  1
    # f1's Lv2: -1
    f1 = if_else(
      str_detect(condition, "f1's Lv1"),
      1L,
      -1L
    ),

    # Add a column for f2 (Second main effect) ------------------------------
    # sum contrast coding for f2
    # f2's Lv1:  1
    # f2's Lv2: -1
    f2 = if_else(
      str_detect(condition, "f2's Lv1"),
      1L,
      -1L
    ),

    # Add a column for intrctn ------------------------------
    # sum contrast coding for intrctn
    # f1's Lv1 and f2's Lv1 and f1's Lv2 and f2's Lv2:  1
    # f1's Lv2 and f2's Lv1 and f1's Lv1 and f2's Lv2: -1
    intrctn = case_when(
      condition == "f1's Lv1 and f2's Lv1"  ~  1L,
      condition == "f1's Lv1 and f2's Lv2" ~ -1L,
      condition == "f1's Lv2 and f2's Lv1"  ~ -1L,
      condition == "f1's Lv2 and f2's Lv2" ~  1L,
      TRUE ~ NA_integer_
    )
  )

  ## participant random effects:
  u <- MASS::mvrnorm(
    n = n_participant,
    mu = c(0, 0, 0, 0),
    Sigma = sigma_u
  ) |>
  as_tibble(rownames = "participant") |>
  mutate(
    participant = as.factor(participant)
  ) |>
  rename(
    u_Intercept = V1,
    u_f2 = V2,
    u_f1 = V3,
    u_intrctn = V4
  )

  # item random effects
  w <- MASS::mvrnorm(
    n = n_item,
    mu = c(0, 0, 0, 0),
    Sigma = sigma_w
  ) |>
  as_tibble(rownames = "item") |>
  mutate(
    item = as.factor(item)
  ) |>
  rename(
    w_Intercept = V1,
    w_f2 = V2,
    w_f1 = V3,
    w_intrctn = V4
  )

  simulation_data <- left_join(
    base,
    u,
    by = "participant"
  ) |>
  left_join(
    w,
    by = "item"
  ) |>
  mutate(
    z = (
      beta[1] + u_Intercept + w_Intercept +
      (beta[2] + u_f1 + w_f1) * f1 +
      (beta[3] + u_f2 + w_f2) * f2 +
      (beta[4] + u_intrctn + w_intrctn) * intrctn
    ),
    rt = rlnorm(
      # https://stackoverflow.com/a/31878476/10215301
      # Even by setting `n = n()`, `z` is used rowwise
      n = n(),
      sdlog = sigma_e,
      meanlog = z
    )
  )

  if (verbose == FALSE) {
    simulation_data <- simulation_data |>
      dplyr::select(
        -starts_with("u_"),
        -starts_with("w_"),
        -z
      )
  }
  else simulation_data
}

Simulation

Settings of priors, formulae, and `stanvar`

set.seed(3434)

Settings of priors, formulae, and `stanvar` (to interpret the value of an R object to a stan code)

#| priors

names <- c(
  "full",
  "null_f1",
  "null_f2",
  "null_intrctn"
)

formulae <- c(
  "f1 + f2 + intrctn",
  "f2 + intrctn",
  "f1 + intrctn",
  "f1 + f2"
) |>
  map(
    ~ paste0(
      "rt ~ 0 + Intercept + ",
      .x,
      " + (1 + f1 + f2 + intrctn | participant)",
      " + (1 + f1 + f2 + intrctn | item)"
    )
  ) |>
  map(
    as.formula
  ) |>
  set_names(
    nm = names
  )

#View(formulae)

priors_H1 <- c(
  prior(
    normal(beta_intercept_latency, 0.2),
    class = b,
    coef = Intercept
  ),
  prior(
    normal(beta_f1_latency, 0.035),
    class = b,
    coef = f1
  ),
  prior(
    normal(beta_f2_latency, 0.035),
    class = b,
    coef = f2
  ),
  prior(
    normal(beta_intrctn_latency, 0.035),
    class = b,
    coef = intrctn
  ),
  prior(
    normal(0, 0.05),
    class = sd
  ),
  prior(
    normal(0, 0.05),
    class = sigma
  ),
  prior(
    lkj(2),
    class = cor
  )
)

#View(priors_H1)

priors <- map(
  .x = names,
  ~ str_replace(.x, "null_", "")
) |>
  map(
  ~ priors_H1 |>
      dplyr::filter(coef != .x)
  ) |>
  set_names(
    nm = names
  )

#View(priors)

stanvars_H1 <- stanvar(
  beta_intercept_latency,
  name = "beta_intercept_latency"
) +
  stanvar(
    beta_f1_latency,
    name = "beta_f1_latency"
  ) +
  stanvar(
    beta_f2_latency,
    name = "beta_f2_latency"
  ) +
  stanvar(
    beta_intrctn_latency,
    name = "beta_intrctn_latency"
  )

#View(stanvars_H1)

stanvars <- map(
  .x = names,
  ~ str_replace(.x, "null_", "")
) |>
  map(
    ~ paste0("beta_", .x, "_latency")
  ) |>
  map(
  ~ stanvars_H1[names(stanvars_H1) != .x]
  )

#View(stanvars)

frml_prior_stanvar <- tibble(
  names    = names,
  formulae = formulae,
  priors   = priors
)

#View(frml_prior_stanvar)

Simulation

options(future.globals.maxSize = 5e10) #50GiB

plan(
  list(
    tweak(
      multisession,
      workers = 8,
      rscript_startup = quote(
        options(
          socketOptions="no-delay"
        )
      )
    ),
    tweak(
      multisession,
      workers = 4,
      rscript_startup = quote(
        options(
          socketOptions="no-delay"
        )
      )
    )
  )
)

#cl <- parallelly::makeClusterPSOCK(8L, autoStop = TRUE)
#
#c_cl <- parallelly::makeClusterPSOCK(4L, autoStop = TRUE)
#
#plan(
#  list(
#    tweak(cluster, workers = cl),
#    tweak(cluster, workers = c_cl)
#  )
#)

tictoc::tic()

#purrr::walk2(
furrr::future_walk2(
  .options = furrr::furrr_options(seed = 1L),
  .x = rep(c(540, 180, 60, 20), each = 100),
  .y = rep(1:100, times = 4),
  ~ {
      start_time <- Sys.time()
      
      env = environment()
      frml_prior_stanvar = frml_prior_stanvar
      stanvars_H1 = stanvars_H1
      simulation_data <- generate_sim_data_latency(
        beta = c(
          rnorm(
            n = 1,
              mean = beta_intercept_latency,
              sd = 0.2
            ),
          rnorm(
            n = 1,
            mean = beta_f1_latency,
            sd = 0.035
          ),
          rnorm(
            n = 1,
            mean = beta_f2_latency,
            sd = 0.035
          ),
          rnorm(
            n = 1,
            mean = beta_intrctn_latency,
            sd = 0.035
          )
        ),
        n_participant = .x,
        sigma_u       = SIN::sdcor2cov(
          stddev = participant_ranefsd_latency,
          corr = corr_matrix_latency
        ),
        sigma_w       = SIN::sdcor2cov(
          stddev = item_ranefsd_latency,
          corr = corr_matrix_latency
        ),
        sigma_e       = extraDistr::rtnorm(
          n    = 1,
          mean = 0,
          sd   = 0.05,
          # To ensure the returned value is always positive,
          # since the returned value is a SD
          a    = 0
        )
      )
      
      frml_prior_stanvar |>
        furrr::future_pmap(
          .options = furrr::furrr_options(
            seed = 1L,
            packages = "brms",
            globals = c(
              frml_prior_stanvar = frml_prior_stanvar, 
              stanvars_H1 = stanvars_H1
            )
          ),
          ~ brm(
            data      = simulation_data,
            formula   = ..2,
            family    = lognormal(),
            prior     = ..3,
            stanvar   = stanvars_H1,
            warmup    =  2000,
            iter      = 52000,
            control   = list(adapt_delta = 0.95),
            cores     = 4,
            chains    = 4,
            save_pars = save_pars(all = TRUE),
            backend   = "cmdstanr"
          )
        ) |>
        set_names(
          nm = names
        ) |>
        map_at(
          .at = 1,
          .f = ~ assign(
            x = "fit_H1",
            value = .x,
            envir = env
          )
        ) |>
        furrr::future_map(
          .options = furrr::furrr_options(
            seed = 1L
          ),
          ~ bridgesampling::bridge_sampler(
            samples = .x
          )
        ) |>
        (\(bf) {
          map_at(
            .x = bf,
            .at = c(2:4),
            ~ bridgesampling::bayes_factor(
                bf$full,
                .x
            )
          )
        })() |>
        list_modify(
          "full" = NULL
        ) |>
        (
          \(bf) {
          tibble(
            Coefficients = (
              bf |>
                names() |>
                str_replace("null_", "")
            ),
            BF = (
              bf |> 
                map_dbl(~ .x[["bf"]])
            )
          )
        })() |>
        full_join(
          (
            fit_H1 |>
              brms::fixef() |>
              as_tibble(
                rownames = "Coefficients"
              )
          ),
          by = "Coefficients"
        ) |>
        mutate(
          npart = .x,
          niter = .y,
          elapsed_time = f_elapsed_time(start_time)
        ) |>
        saveRDS(
          file = paste0(
            getwd(), 
            "/0-replicate-SK16/simulation/main/n",
            sprintf("%04d", .x),
            "_iter",
            sprintf("%03d", .y),
            ".Rds"
          )
        )
      # As the objects are huge and consumes a lot of memory
      rm(
        simulation_data,
        fit_H1,
        envir = env
      )
      
    }
)

tictoc::toc()

plan(sequential)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating Bayesian models using `brms::brm()` repeatedly in a nested parallelisation using `furrr` #595

{{title}}

Implementation of main effects

Implementation of random effects

Data generating function

Settings of priors, formulae, and `stanvar` (to interpret the value of an R object to a stan code)

Replies: 0 comments

Select a reply

Calculating Bayesian models using brms::brm() repeatedly in a nested parallelisation using furrr #595

CLRafaelR Feb 28, 2022

Inner layer

Outer layer

Question

Preparation for the simulation

Implementation of main effects

Implementation of random effects

Data generating function

Simulation

Settings of priors, formulae, and stanvar (to interpret the value of an R object to a stan code)

Replies: 0 comments

Calculating Bayesian models using `brms::brm()` repeatedly in a nested parallelisation using `furrr` #595

CLRafaelR
Feb 28, 2022

Settings of priors, formulae, and `stanvar` (to interpret the value of an R object to a stan code)