Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI sampling #350

Merged
merged 30 commits into from
Dec 4, 2020
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
45df936
initial version of mpi support
rok-cesnovar Nov 15, 2020
0b4b42a
Merge branch 'master' into mpi
rok-cesnovar Nov 17, 2020
f050b47
Merge branch 'master' into mpi
rok-cesnovar Nov 18, 2020
00d2ddc
mpi_args is a list now
rok-cesnovar Nov 18, 2020
a885a63
Merge branch 'cpp_options_fix' into mpi
rok-cesnovar Nov 18, 2020
e2df165
convert list to args vector
rok-cesnovar Nov 18, 2020
393efac
remove echoing cmd
rok-cesnovar Nov 18, 2020
61412cf
Merge branch 'master' into mpi
rok-cesnovar Nov 18, 2020
75c6dc7
added basic docs
rok-cesnovar Nov 19, 2020
c1e8207
add tests
rok-cesnovar Nov 20, 2020
8175c5a
use openmpi on GA
rok-cesnovar Nov 20, 2020
8a35242
cleanup
rok-cesnovar Nov 20, 2020
1a6d40d
set n=1 for test
rok-cesnovar Nov 20, 2020
6bd2b4e
add .Rd
rok-cesnovar Nov 20, 2020
918b66c
skip mpi in codecov
rok-cesnovar Nov 20, 2020
8b1ff38
Merge branch 'master' into mpi
rok-cesnovar Nov 25, 2020
e51d077
parallel_chains = 1
rok-cesnovar Nov 25, 2020
cf33b34
change starting tests
rok-cesnovar Nov 25, 2020
6be7a31
Merge branch 'master' into mpi
rok-cesnovar Nov 26, 2020
500bd6d
Merge branch 'master' into mpi
rok-cesnovar Nov 28, 2020
3228380
mpi_sample -> sample_mpi
rok-cesnovar Nov 29, 2020
86e71d7
Merge branch 'master' into mpi
rok-cesnovar Dec 1, 2020
9121292
minor doc edits
jgabry Dec 2, 2020
d46b8d7
revert one of the doc edits
jgabry Dec 2, 2020
3ffd517
clarify that sample_mpi is missing a few arguments
jgabry Dec 2, 2020
641d0c3
rename test file
jgabry Dec 2, 2020
6db11dc
don't need to define parallel_chains
jgabry Dec 2, 2020
234f1d1
Merge branch 'master' into mpi
rok-cesnovar Dec 3, 2020
cf22f84
update NEWS.md after release
rok-cesnovar Dec 3, 2020
6e1c24b
remove duplicate news item
jgabry Dec 3, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,12 @@ jobs:
mingw32-make --version
Get-Command mingw32-make | Select-Object -ExpandProperty Definition
shell: powershell

- name: Install MPI
if: runner.os == 'Linux'
run: |
sudo apt-get install -y openmpi-bin
echo "CMDSTANR_RUN_MPI_TESTS=TRUE" >> $GITHUB_ENV

- uses: r-lib/actions/setup-r@master
with:
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/Test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@ jobs:
- uses: r-lib/actions/setup-pandoc@master

- name: Install Ubuntu dependencies
run: sudo apt-get install libcurl4-openssl-dev
run: |
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install -y openmpi-bin
echo "CMDSTANR_RUN_MPI_TESTS=TRUE" >> $GITHUB_ENV

- name: Query dependencies
run: |
install.packages('remotes')
Expand Down
174 changes: 174 additions & 0 deletions R/model.R
Original file line number Diff line number Diff line change
Expand Up @@ -914,6 +914,180 @@ sample_method <- function(data = NULL,
}
CmdStanModel$set("public", name = "sample", value = sample_method)

#' Run Stan's MCMC algorithms with MPI
#'
#' @name model-method-mpi-sample
#' @aliases mpi_sample
#' @family CmdStanModel methods
#'
#' @description The `$mpi_sample()` method of a [`CmdStanModel`] object runs the
#' default MCMC algorithm in CmdStan (`algorithm=hmc engine=nuts`) with MPI
#' (STAN_MPI makefile flag), to produce a set of draws from the posterior
#' distribution of a model conditioned on some data.
#'
#' In order to use MPI with Stan, an MPI implementation must be installed.
#' For Unix systems the most commonly used implementations are MPICH and OpenMPI.
#' The implementations provide an MPI C++ compiler wrapper (for example mpicxx),
#' which is required to compile the model.
#'
#' An example of compiling with STAN_MPI:
#' ```
#' cpp_options = list(STAN_MPI = TRUE, CXX="mpicxx", TBB_CXX_TYPE="gcc")
#' mod <- cmdstan_model("model.stan", cpp_options = cpp_options)
#' ```
#' The C++ options that need supplied to the compile call are:
#' - `STAN_MPI`: Enables the use of MPI with Stan
#' - `CXX`: The name of the MPI C++ compiler wrapper (typicall mpicxx)
#' - `TBB_CXX_TYPE`: The C++ compiler the MPI wrapper wraps. Typically gcc on
#' Linux and clang on macOS.
#'
#' In the call to the `$mpi_sample()` method, we can additionally provide
#' the name of the MPI launcher (`mpi_cmd`), which defaults to "mpiexec",
#' and any other MPI launch arguments. In most cases, it is enough to
#' only define the number of processes with `mpi_args = list("n" = 4)`.
#'
#' An example of a call of `$mpi_sample()`:
#' ```
#' cpp_options = list(STAN_MPI = TRUE, CXX="mpicxx", TBB_CXX_TYPE="gcc")
#' fit <- mod$mpi_sample(data_list, mpi_args = c("-n", 4))
#' ```
#'
#' @section Usage:
#' ```
#' $mpi_sample(
#' data = NULL,
#' mpi_cmd = "mpiexec",
#' mpi_args = NULL,
#' seed = NULL,
#' refresh = NULL,
#' init = NULL,
#' save_latent_dynamics = FALSE,
#' output_dir = NULL,
#' chains = 4,
#' parallel_chains = getOption("mc.cores", 1),
#' chain_ids = seq_len(chains),
#' iter_warmup = NULL,
#' iter_sampling = NULL,
#' save_warmup = FALSE,
#' thin = NULL,
#' max_treedepth = NULL,
#' adapt_engaged = TRUE,
#' adapt_delta = NULL,
#' step_size = NULL,
#' metric = NULL,
#' metric_file = NULL,
#' inv_metric = NULL,
#' init_buffer = NULL,
#' term_buffer = NULL,
#' window = NULL,
#' fixed_param = FALSE,
#' sig_figs = NULL,
#' validate_csv = TRUE,
#' show_messages = TRUE
#' )
#' ```
#'
#' @section Arguments:
#' * `mpi_cmd`: (character vector) The MPI launcher used for launching MPI processes.
#' The default launcher is `mpiexec`.
#' * `mpi_args`: (list) A list of arguments to use when launching MPI processes.
#' For example, mpi_args = list("n" = 4) launches the executable as
#' `mpiexec -n 4 model_executable`, followed by CmdStan arguments
#' for the model executable.
#' * `data`, `seed`, `refresh`, `init`, `save_latent_dynamics`, `output_dir`,
#' `chains`, `parallel_chains`, `chain_ids`, `iter_warmup`, `iter_sampling`,
#' `save_warmup`, `thin`, `max_treedepth`, `adapt_engaged`, `adapt_delta`,
#' `step_size`, `metric`, `metric_file`, `inv_metric`, `init_buffer`,
#' `term_buffer`, `window`, `fixed_param`, `sig_figs`, `validate_csv`,
#' `show_messages`:
#' Same as for the [`$sample()`][model-method-sample] method.
#'
#' @section Value: The `$mpi_sample()` method returns a [`CmdStanMCMC`] object.
#'
#' @template seealso-docs
#' @inherit cmdstan_model examples
#'
NULL
mpi_sample_method <- function(data = NULL,
mpi_cmd = "mpiexec",
mpi_args = NULL,
seed = NULL,
refresh = NULL,
init = NULL,
save_latent_dynamics = FALSE,
output_dir = NULL,
chains = 1,
parallel_chains = getOption("mc.cores", 1),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default for chains in $mpi_sample() is 1, while for $sample() its 4. Or should we leave it the same as for $sample()?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when chains=4 in terms of distributing processes? With mpiexec -n 4 is each chain solved by 1 process?

Copy link
Member Author

@rok-cesnovar rok-cesnovar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's chains = 4, parallel_chains = 1 that is the same as 4 sequential $mpi_sample(chains = 1) and this is just a convenience so the draws are merged together in the fit.

If chains = 4, parallel_chains = 4 that means 4 mpiexec calls with n=4 all running at the same time. Not sure that is useful or if we should just fix parallel_chains to 1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, as I thought mpi_args=c("-n", 4...) controls the total number of MPI processes. But looks like parallel_chains=4 implies mpiexec -n 4 too. Is that right?

Copy link
Member Author

@rok-cesnovar rok-cesnovar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without MPI, parallel_chains just means that 4 "./model args" calls are made and 4 models processes run in parallel.
So like running this in shell

for i in {1..4}
    do
      ./bernoulli sample data file=bernoulli.data.json \
      output file=output_${i}.csv &
    done

In $mpi_sample() this would mean 4 processes that would run mpiexec -n x ./model args. Does that make more sense?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this really confusing, as arg chains and parallel_chains seem to have overlapping meanings. What happens if we remove paralle_chains and only use chains in mpi_sample?

Copy link
Member Author

@rok-cesnovar rok-cesnovar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes mainly from non-parallel MCMC sampling. For example users want to run 4 chains but want to only use 2 cores for two chains and use the other two for something else. rstan uses the cores argument for this same thing.

I think this makes less sense or is less useful in the context of within-chain parallelization with threading or MPI, because if someone uses parallelization its likely they want all the CPU power/cluster power.

We can remove parallel_chains, we just need to decide what to do in the case of chains > 1. Do we run chains sequentially (paralllel_chains = 1) or run all of them at once (parallel_chains = chains).
My gut feeling is that for we go with the former?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is just that one typically runs a single mpiexec with maximum n?

Copy link

@yizhang-yiz yizhang-yiz Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the context. For now I agree with you semantically it makes more sense to have chains=4 + mpi_arg("-n", "x") equivalent to

for i in {1..4}
    do
      mpiexec -n x ./bernoulli sample data file=bernoulli.data.json \
      output file=output_${i}.csv &
    done

Copy link
Member Author

@rok-cesnovar rok-cesnovar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's think about this a bit more. I am not sure what the best solution would be.

chain_ids = seq_len(chains),
iter_warmup = NULL,
iter_sampling = NULL,
save_warmup = FALSE,
thin = NULL,
max_treedepth = NULL,
adapt_engaged = TRUE,
adapt_delta = NULL,
step_size = NULL,
metric = NULL,
metric_file = NULL,
inv_metric = NULL,
init_buffer = NULL,
term_buffer = NULL,
window = NULL,
fixed_param = FALSE,
sig_figs = NULL,
validate_csv = TRUE,
show_messages = TRUE) {
Copy link
Member Author

@rok-cesnovar rok-cesnovar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threads_per_chain is not an arg for $mpi_sample(), because if by any chance someone uses threading with MPI, threads_per_chain will always be 1, regardless of what is set. This is a limitation set by Cmdstan.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not unusual to use use threading in an MPI process, it's just we haven't done that for Stan.

Copy link
Member Author

@rok-cesnovar rok-cesnovar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. If we enable it in Stan (which we should), we can then add threading here also.


if (fixed_param) {
chains <- 1
parallel_chains <- 1
save_warmup <- FALSE
}

checkmate::assert_integerish(chains, lower = 1, len = 1)
checkmate::assert_integerish(parallel_chains, lower = 1, null.ok = TRUE)
checkmate::assert_integerish(chain_ids, lower = 1, len = chains, unique = TRUE, null.ok = FALSE)
sample_args <- SampleArgs$new(
iter_warmup = iter_warmup,
iter_sampling = iter_sampling,
save_warmup = save_warmup,
thin = thin,
max_treedepth = max_treedepth,
adapt_engaged = adapt_engaged,
adapt_delta = adapt_delta,
step_size = step_size,
metric = metric,
metric_file = metric_file,
inv_metric = inv_metric,
init_buffer = init_buffer,
term_buffer = term_buffer,
window = window,
fixed_param = fixed_param
)
cmdstan_args <- CmdStanArgs$new(
method_args = sample_args,
model_name = strip_ext(basename(self$exe_file())),
exe_file = self$exe_file(),
proc_ids = chain_ids,
data_file = process_data(data),
save_latent_dynamics = save_latent_dynamics,
seed = seed,
init = init,
refresh = refresh,
output_dir = output_dir,
validate_csv = validate_csv,
sig_figs = sig_figs
)
cmdstan_procs <- CmdStanMCMCProcs$new(
num_procs = chains,
parallel_procs = parallel_chains,
show_stderr_messages = show_messages
)
runset <- CmdStanRun$new(args = cmdstan_args, procs = cmdstan_procs)
runset$run_cmdstan_mpi(mpi_cmd, mpi_args)
CmdStanMCMC$new(runset)
}
CmdStanModel$set("public", name = "mpi_sample", value = mpi_sample_method)
rok-cesnovar marked this conversation as resolved.
Show resolved Hide resolved

#' Run Stan's optimization algorithms
#'
Expand Down
31 changes: 25 additions & 6 deletions R/run.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ CmdStanRun <- R6::R6Class(
}
invisible(self)
},

num_procs = function() self$procs$num_procs(),
proc_ids = function() self$procs$proc_ids(),
exe_file = function() self$args$exe_file,
Expand Down Expand Up @@ -150,6 +149,10 @@ CmdStanRun <- R6::R6Class(
}
},

run_cmdstan_mpi = function(mpi_cmd, mpi_args) {
private$run_sample_(mpi_cmd, mpi_args)
},

# run bin/stansummary or bin/diagnose
# @param tool The name of the tool in `bin/` to run.
# @param flags An optional character vector of flags (e.g. c("--sig_figs=1")).
Expand Down Expand Up @@ -222,10 +225,15 @@ CmdStanRun <- R6::R6Class(


# run helpers -------------------------------------------------
.run_sample <- function() {
.run_sample <- function(mpi_cmd = NULL, mpi_args = NULL) {
procs <- self$procs
on.exit(procs$cleanup(), add = TRUE)

if (!is.null(mpi_cmd)) {
if (is.null(mpi_args)) {
mpi_args = list()
}
mpi_args[["exe"]] <- self$exe_file()
}
# add path to the TBB library to the PATH variable
if (cmdstan_version() >= "2.21" && os_is_windows()) {
path_to_TBB <- file.path(cmdstan_path(), "stan", "lib", "stan_math", "lib", "tbb")
Expand Down Expand Up @@ -261,7 +269,9 @@ CmdStanRun <- R6::R6Class(
id = chain_id,
command = self$command(),
args = self$command_args()[[chain_id]],
wd = dirname(self$exe_file())
wd = dirname(self$exe_file()),
mpi_cmd = mpi_cmd,
mpi_args = mpi_args
)
procs$mark_proc_start(chain_id)
procs$set_active_procs(procs$active_procs() + 1)
Expand Down Expand Up @@ -475,12 +485,21 @@ CmdStanProcs <- R6::R6Class(
get_proc = function(id) {
private$processes_[[id]]
},
new_proc = function(id, command, args, wd) {
new_proc = function(id, command, args, wd, mpi_cmd = NULL, mpi_args = NULL) {
if (!is.null(mpi_cmd)) {
exe_name <- mpi_args[["exe"]]
mpi_args[["exe"]] <- NULL
mpi_args_vector <- c()
for (i in names(mpi_args)) {
mpi_args_vector <- c(paste0("-", i), mpi_args[[i]], mpi_args_vector)
}
args = c(mpi_args_vector, exe_name, args)
command <- mpi_cmd
}
private$processes_[[id]] <- processx::process$new(
command = command,
args = args,
wd = wd,
echo_cmd = FALSE,
stdout = "|",
stderr = "|"
)
Expand Down
1 change: 1 addition & 0 deletions man/model-method-check_syntax.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/model-method-compile.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/model-method-generate-quantities.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading