You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(If this question should be posted on StackOverflow than here, please feel free to let me know...:pray:)
I am trying to calculate the required number of participants in an experiment being planned, by Bayesian power analyses with data simulation suggested by Vasishth et al. (2021). I apply a nested parallelisation to do the calculation, but I have failed to complete the entire iteration without any error. I suspect that connection failures occurred during the loop, leaving some futures unresolved. However, I could not detect which workers unexpectedly malfunction and how I should stop them. Therefore, I want to know how I could detect and stop the unconnected workers.
On an instance of Amazon EC2 (m6i.32xlarge; 128 logical cores and a memory of 512 GiB), I have tried to run the simulation. Since I am trying to do a nested parallelisation, I set the following plan:
What the outer layer (8 cores) and inner layer (4 cores) are doing is summarised in the figure below and the next sections:
Inner layer
create one simulation dataset using a user-defined function generate_sim_data_latency();
fit the following four models in parallel using future_pmap(..., brms::brm(..., chains = 4, cores = 4));
an alternative model that contains f1, f2, and their interaction intrctn as explanatory variables: rt ~ 0 + Intercept + f1 + f2 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
a null model that lacks f1: rt ~ 0 + Intercept + f2 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item),
a null model that lacks f2: rt ~ 0 + Intercept + f1 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item),
a null model that lacks intrctn: rt ~ 0 + Intercept + f1 + f2 + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
Each model uses four cores, and these cores are set by brm(.., cores = 4, ...), not plan()
do bridge sampling for each of the four models in parallel using future_map(..., bridgesampling::bridge_sampler(...));
calculate the Bayes Factors comparing the alternative model against each of null models;
save the coefficients of the alternative model along with the Bayes Factors calculated above as an .Rds file. The output looks like as follows:
# A tibble: 4 x 9CoefficientsBFEstimateEst.ErrorQ2.5Q97.5npartniterelapsed_time<chr><dbl><dbl><dbl><dbl><dbl><dbl><int><Period>1order0.378-0.001210.0220-0.04650.040220156M45.9S2voice0.4800.01110.0191-0.02680.048120156M45.9S3intrctn0.4080.01650.0157-0.01460.047220156M45.9S4InterceptNA7.650.01177.627.6720156M45.9S
This inner processes takes approximately 50--80 minutes to complete one cycle. In the example above, it took almost 57 minutes before produce .Rds.
Outer layer
do the steps of the inner layer (1--5 above), 100 times per the number of participants in the simulattion dataset (c(540, 180, 60, 20)).
Question
By doing the above-mentioned iteration with nested parallelisation, I expected that I could get eight .Rds files, at least, after the outer layer finished one cycle. The reason is that the plan assigns eight cores to the outer layer. However, I always get only one .Rds even waiting after 12 hours. In fact, when the only .Rds file I got was written into my working directory (50--80 mins after from starting execution), the CPU usage suddenly decreases from 100% to 20--30%, and the number of threads running or in run queue dropped from 130 or so to 40. Both CPU usage and the number of threads running never increased after that .Rds file was written. No error message could not be found. Therefore, some processes (workers) lost connection, leaving some future unresolved (and leaving garbage uncleaned).
Then, how can I avoid such failures of connection or of future resolution? How should I detect and stop the unconnected workers? Is this kind of nesting itself impossible? Any suggestion and hints are appreciated!
# SDs of participant random effectparticipant_ranefsd_latency<- c(
# InterceptextraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
),
# f2extraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
),
# f1extraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
),
# intrctnextraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
)
)
# SDs of item random effectitem_ranefsd_latency<- c(
# InterceptextraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
),
# f2extraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
),
# f1extraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
),
# intrctnextraDistr::rtnorm(
n=1,
mean=0,
sd=0.05,
# To ensure the returned value is always positive,# since the returned value is a SDa=0
)
)
# some intermediate values were chosen for correlations:corr_matrix_latency<-rethinking::rlkjcorr(
# Number of random matrices to samplen=1,
# Dimension of correlation matrixK=4,
# Parameter controlling shape of distributioneta=2
)
Definition of the simulation data generator
Data generating function
generate_sim_data_latency<-function(
n_item=8,
n_participant=NULL,
beta=NULL,
# participant vcov 4x4 matrix ------------------------------sigma_u=NULL,
# item vcov 4x4 matrix ------------------------------sigma_w=NULL,
sigma_e=NULL,
verbose=FALSE,
seed=NULL
) {
# Set seedif (is.null(seed) ==FALSE) {
set.seed(seed)
}
# Data frame generationbase<- tibble(
# Add a column for participant ------------------------------participant= rep(
1:n_participant,
each=n_item*2*4
) |>
as.factor(),
# Add a column for item ------------------------------## 8 items# 2 mirror images (in different colours) for a condition per item# e.g. for os-u 'to push'# [L] Red agent and [R] Blue patient# [L] White patient and [R] Black agent# 4 condition# 8 * 2 * 4 = 64 stimuli in total per participantitem= rep(
1:n_item,
# 2 mirror images for a condition * 4 condition * (number of participants)each=2*4,
times=n_participant
) |>
as.factor(),
# Add a column for the condition ------------------------------condition= rep(
c(
"f1's Lv1 and f2's Lv1",
"f1's Lv1 and f2's Lv2",
"f1's Lv2 and f2's Lv1",
"f1's Lv2 and f2's Lv2"
),
each=2,
times=8*n_participant
),
# Add columns for the factors ------------------------------# The values are integer to run code faster and to save memory# https://stackoverflow.com/a/7014671/10215301# Add a column for f1 (First main effect) ------------------------------# sum contrast coding for f1# f1's Lv1: 1# f1's Lv2: -1f1= if_else(
str_detect(condition, "f1's Lv1"),
1L,
-1L
),
# Add a column for f2 (Second main effect) ------------------------------# sum contrast coding for f2# f2's Lv1: 1# f2's Lv2: -1f2= if_else(
str_detect(condition, "f2's Lv1"),
1L,
-1L
),
# Add a column for intrctn ------------------------------# sum contrast coding for intrctn# f1's Lv1 and f2's Lv1 and f1's Lv2 and f2's Lv2: 1# f1's Lv2 and f2's Lv1 and f1's Lv1 and f2's Lv2: -1intrctn= case_when(
condition=="f1's Lv1 and f2's Lv1"~1L,
condition=="f1's Lv1 and f2's Lv2"~-1L,
condition=="f1's Lv2 and f2's Lv1"~-1L,
condition=="f1's Lv2 and f2's Lv2"~1L,
TRUE~NA_integer_
)
)
## participant random effects:u<-MASS::mvrnorm(
n=n_participant,
mu= c(0, 0, 0, 0),
Sigma=sigma_u
) |>
as_tibble(rownames="participant") |>
mutate(
participant= as.factor(participant)
) |>
rename(
u_Intercept=V1,
u_f2=V2,
u_f1=V3,
u_intrctn=V4
)
# item random effectsw<-MASS::mvrnorm(
n=n_item,
mu= c(0, 0, 0, 0),
Sigma=sigma_w
) |>
as_tibble(rownames="item") |>
mutate(
item= as.factor(item)
) |>
rename(
w_Intercept=V1,
w_f2=V2,
w_f1=V3,
w_intrctn=V4
)
simulation_data<- left_join(
base,
u,
by="participant"
) |>
left_join(
w,
by="item"
) |>
mutate(
z= (
beta[1] +u_Intercept+w_Intercept+
(beta[2] +u_f1+w_f1) *f1+
(beta[3] +u_f2+w_f2) *f2+
(beta[4] +u_intrctn+w_intrctn) *intrctn
),
rt= rlnorm(
# https://stackoverflow.com/a/31878476/10215301# Even by setting `n = n()`, `z` is used rowwisen= n(),
sdlog=sigma_e,
meanlog=z
)
)
if (verbose==FALSE) {
simulation_data<-simulation_data|>dplyr::select(
-starts_with("u_"),
-starts_with("w_"),
-z
)
}
elsesimulation_data
}
Simulation
Settings of priors, formulae, and `stanvar`
set.seed(3434)
Settings of priors, formulae, and stanvar (to interpret the value of an R object to a stan code)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
(If this question should be posted on StackOverflow than here, please feel free to let me know...:pray:)
I am trying to calculate the required number of participants in an experiment being planned, by Bayesian power analyses with data simulation suggested by Vasishth et al. (2021). I apply a nested parallelisation to do the calculation, but I have failed to complete the entire iteration without any error. I suspect that connection failures occurred during the loop, leaving some futures unresolved. However, I could not detect which workers unexpectedly malfunction and how I should stop them. Therefore, I want to know how I could detect and stop the unconnected workers.
On an instance of Amazon EC2 (
m6i.32xlarge
; 128 logical cores and a memory of 512 GiB), I have tried to run the simulation. Since I am trying to do a nested parallelisation, I set the following plan:I tried to set cluster using the commands below, but I could not manage to set clusters on the EC2 instance:
What the outer layer (8 cores) and inner layer (4 cores) are doing is summarised in the figure below and the next sections:
Inner layer
generate_sim_data_latency()
;future_pmap(..., brms::brm(..., chains = 4, cores = 4))
;f1
,f2
, and their interactionintrctn
as explanatory variables:rt ~ 0 + Intercept + f1 + f2 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
f1
:rt ~ 0 + Intercept + f2 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
,f2
:rt ~ 0 + Intercept + f1 + intrctn + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
,intrctn
:rt ~ 0 + Intercept + f1 + f2 + (1 + f1 + f2 + intrctn | participant) + (1 + f1 + f2 + intrctn | item)
brm(.., cores = 4, ...)
, notplan()
future_map(..., bridgesampling::bridge_sampler(...))
;.Rds
file. The output looks like as follows:This inner processes takes approximately 50--80 minutes to complete one cycle. In the example above, it took almost 57 minutes before produce
.Rds
.Outer layer
c(540, 180, 60, 20)
).Question
By doing the above-mentioned iteration with nested parallelisation, I expected that I could get eight
.Rds
files, at least, after the outer layer finished one cycle. The reason is that the plan assigns eight cores to the outer layer. However, I always get only one.Rds
even waiting after 12 hours. In fact, when the only.Rds
file I got was written into my working directory (50--80 mins after from starting execution), the CPU usage suddenly decreases from 100% to 20--30%, and the number of threads running or in run queue dropped from 130 or so to 40. Both CPU usage and the number of threads running never increased after that.Rds
file was written. No error message could not be found. Therefore, some processes (workers) lost connection, leaving some future unresolved (and leaving garbage uncleaned).Then, how can I avoid such failures of connection or of future resolution? How should I detect and stop the unconnected workers? Is this kind of nesting itself impossible? Any suggestion and hints are appreciated!
Preparation for the simulation
Package load
Settings of the generator of simulation datasets
Implementation of main effects
Implementation of random effects
Definition of the simulation data generator
Data generating function
Simulation
Settings of priors, formulae, and `stanvar`
set.seed(3434)
Settings of priors, formulae, and
stanvar
(to interpret the value of an R object to a stan code)Simulation
Beta Was this translation helpful? Give feedback.
All reactions