From c4f4a0480f52130d0cf2e76f1942c845744764b8 Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 4 Apr 2024 12:42:27 -0400 Subject: [PATCH 1/6] Merge MCMC docs --- redirects.txt | 3 +- src/_quarto.yml | 1 - src/cmdstan-guide/_quarto.yml | 1 - src/cmdstan-guide/mcmc_config.qmd | 336 ++++++++++++++++++++-- src/cmdstan-guide/mcmc_sampling_intro.qmd | 321 --------------------- src/cmdstan-guide/stan_csv_apdx.qmd | 2 +- 6 files changed, 314 insertions(+), 350 deletions(-) delete mode 100644 src/cmdstan-guide/mcmc_sampling_intro.qmd diff --git a/redirects.txt b/redirects.txt index 7b869b669..2aff4915f 100644 --- a/redirects.txt +++ b/redirects.txt @@ -173,7 +173,6 @@ https://mc-stan.org/docs/functions-reference/sort-functions.html https://mc-stan https://mc-stan.org/docs/cmdstan-guide/extracting-log-probabilities-and-gradients-for-diagnostics.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html#pathfinder-configuration https://mc-stan.org/docs/functions-reference/negative-binomial-distribution.html https://mc-stan.org/docs/functions-reference/unbounded_discrete_distributions.html#negative-binomial-distribution https://mc-stan.org/docs/reference-manual/sundials-license.html https://mc-stan.org/docs/reference-manual/licenses.html#sundials-license -https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_sampling_intro.html#running-the-sampler https://mc-stan.org/docs/functions-reference/ordered-probit-distribution.html https://mc-stan.org/docs/functions-reference/bounded_discrete_distributions.html#ordered-probit-distribution https://mc-stan.org/docs/functions-reference/multivariate-gaussian-process-distribution-cholesky-parameterization.html https://mc-stan.org/docs/functions-reference/distributions_over_unbounded_vectors.html#multivariate-gaussian-process-distribution-cholesky-parameterization https://mc-stan.org/docs/stan-users-guide/bayesian-poststratification.html https://mc-stan.org/docs/stan-users-guide/poststratification.html#bayesian-poststratification @@ -591,3 +590,5 @@ https://mc-stan.org/docs/cmdstan-guide/cmdstan-tools.html https://mc-stan.org/do https://mc-stan.org/docs/cmdstan-guide/print-deprecated-mcmc-output-analysis.html https://mc-stan.org/docs/cmdstan-guide/print.html https://mc-stan.org/docs/cmdstan-guide/command-line-interface-overview.html https://mc-stan.org/docs/cmdstan-guide/command_line_options.html https://mc-stan.org/docs/functions-reference/matrix-arithmetic-operators.html https://mc-stan.org/docs/functions-reference/matrix_operations.html#matrix-arithmetic-operators +https://mc-stan.org/docs/cmdstan-guide/mcmc_sampling_intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html +https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html diff --git a/src/_quarto.yml b/src/_quarto.yml index 2c904f44f..83d5dae98 100644 --- a/src/_quarto.yml +++ b/src/_quarto.yml @@ -241,7 +241,6 @@ website: - cmdstan-guide/installation.qmd - cmdstan-guide/example_model_data.qmd - cmdstan-guide/compiling_stan_programs.qmd - - cmdstan-guide/mcmc_sampling_intro.qmd - cmdstan-guide/optimization_intro.qmd - cmdstan-guide/pathfinder_intro.qmd - cmdstan-guide/variational_intro.qmd diff --git a/src/cmdstan-guide/_quarto.yml b/src/cmdstan-guide/_quarto.yml index 0870f73f1..243b37596 100644 --- a/src/cmdstan-guide/_quarto.yml +++ b/src/cmdstan-guide/_quarto.yml @@ -39,7 +39,6 @@ book: - installation.qmd - example_model_data.qmd - compiling_stan_programs.qmd - - mcmc_sampling_intro.qmd - optimization_intro.qmd - pathfinder_intro.qmd - variational_intro.qmd diff --git a/src/cmdstan-guide/mcmc_config.qmd b/src/cmdstan-guide/mcmc_config.qmd index fc2658b7e..d1961ff8f 100644 --- a/src/cmdstan-guide/mcmc_config.qmd +++ b/src/cmdstan-guide/mcmc_config.qmd @@ -1,22 +1,134 @@ --- -pagetitle: MCMC Sampling Configuration +pagetitle: MCMC Sampling --- # MCMC Sampling using Hamiltonian Monte Carlo {#mcmc-config} + The `sample` method provides Bayesian inference over the model conditioned on data using Hamiltonian Monte Carlo (HMC) sampling. By default, the inference engine used is the No-U-Turn sampler (NUTS), an adaptive form of Hamiltonian Monte Carlo sampling. For details on HMC and NUTS, see the Stan Reference Manual chapter on [MCMC Sampling](https://mc-stan.org/docs/reference-manual/mcmc.html). -The full set of configuration options available for the `sample` method is -reported at the beginning of the sampler output file as CSV comments. + +## Running the sampler + +To generate a sample from the posterior distribution of +the model conditioned on the data, +we run the executable program with the argument `sample` or `method=sample` +together with the input data. +The executable can be run from any directory. + +The full set of configuration options available for the `sample` method +is available by using the `sample help-all` subcommand. The arguments +with their requested values or defaults are also reported at the beginning +of the sampler console output and in the output CSV file's comments. + +Here, we run it in the directory which contains the Stan program and input data, +`/examples/bernoulli`: +``` +> cd examples/bernoulli +> ls + bernoulli bernoulli.data.json bernoulli.data.R bernoulli.stan +``` + +To execute sampling of the model under Linux or Mac, use: +``` +> ./bernoulli sample data file=bernoulli.data.json +``` + +In Windows, the `./` prefix is not needed: + +``` +> bernoulli.exe sample data file=bernoulli.data.json +``` + +The output is the same across all supported platforms. First, the +configuration of the program is echoed to the standard output: + +``` +method = sample (Default) + sample + num_samples = 1000 (Default) + num_warmup = 1000 (Default) + save_warmup = 0 (Default) + thin = 1 (Default) + adapt + engaged = 1 (Default) + gamma = 0.050000000000000003 (Default) + delta = 0.80000000000000004 (Default) + kappa = 0.75 (Default) + t0 = 10 (Default) + init_buffer = 75 (Default) + term_buffer = 50 (Default) + window = 25 (Default) + save_metric = 0 (Default) + algorithm = hmc (Default) + hmc + engine = nuts (Default) + nuts + max_depth = 10 (Default) + metric = diag_e (Default) + metric_file = (Default) + stepsize = 1 (Default) + stepsize_jitter = 0 (Default) + num_chains = 1 (Default) +id = 0 (Default) +data + file = bernoulli.data.json +init = 2 (Default) +random + seed = 3252652196 (Default) +output + file = output.csv (Default) + diagnostic_file = (Default) + refresh = 100 (Default) +``` + +After the configuration has been displayed, a short timing message is +given. + +``` +Gradient evaluation took 1.2e-05 seconds +1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. +Adjust your expectations accordingly! +``` + +Next, the sampler reports the iteration number, reporting the +percentage complete. + +``` +Iteration: 1 / 2000 [ 0%] (Warmup) +... +Iteration: 2000 / 2000 [100%] (Sampling) +``` + +Finally, the sampler reports timing information: +``` + Elapsed Time: 0.007 seconds (Warm-up) + 0.017 seconds (Sampling) + 0.024 seconds (Total) +``` + +## Stan CSV output file {#mcmc_output_csv} + +Each execution of the model results in draws from a single Markov +chain being written to a file in [comma-separated value (CSV) format](stan_csv_apdx.qmd). +The default name of the output file is `output.csv`. + +The first part of the output file records the version of the +underlying Stan library and the configuration as comments (i.e., lines +beginning with the pound sign (`#`)). + When the example model `bernoulli.stan` is run via the command line -with all default arguments, -the resulting Stan CSV file header comments show the complete set -of default sample method configuration options: +with all default arguments, the following configuration is displayed: + ``` +# stan_version_major = 2 +# stan_version_minor = 23 +# stan_version_patch = 0 +# model = bernoulli_model # method = sample (Default) # sample # num_samples = 1000 (Default) @@ -43,6 +155,69 @@ of default sample method configuration options: # stepsize = 1.000000 (Default) # stepsize_jitter = 0.000000 (Default) # num_chains = 1 (Default) +# output +# file = output.csv (Default) +# diagnostic_file = (Default) +# refresh = 100 (Default) +``` + +This is followed by a CSV header indicating the names of the values +sampled. +``` +lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta +``` +The first output columns report the HMC sampler information: + +- `lp__` - the total log probability density (up to an additive constant) at each sample +- `accept_stat__ ` - the average Metropolis acceptance probability over each simulated Hamiltonian trajectory +- `stepsize__ ` - integrator step size +- `treedepth__ ` - depth of tree used by NUTS (NUTS sampler) +- `n_leapfrog__ ` - number of leapfrog calculations (NUTS sampler) +- `divergent__ ` - has value `1` if trajectory diverged, otherwise `0`. (NUTS sampler) +- `energy__ ` - value of the Hamiltonian +- `int_time__ ` - total integration time (static HMC sampler) + +Because the above header is from the NUTS sampler, it has columns `treedepth__`, `n_leapfrog__`, and `divergent__` +and doesn't have column `int_time__`. +The remaining columns correspond to model parameters. For the +Bernoulli model, it is just the final column, `theta`. + +The header line is written to the output file before warmup begins. +If option `save_warmup` is set to `1`, the warmup draws are output directly after the header. +The total number of warmup draws saved is `num_warmup` divided by `thin`, rounded up (i.e., `ceiling`). + +Following the warmup draws (if any), are comments which record the results of adaptation: +the stepsize, and inverse mass metric used during sampling: + +``` +# Adaptation terminated +# Step size = 0.884484 +# Diagonal elements of inverse mass matrix: +# 0.535006 +``` + +The default sampler is NUTS with an adapted step size and a diagonal +inverse mass matrix. For this example, the step size is 0.884484, and +the inverse mass contains the single entry 0.535006 corresponding to +the parameter `theta`. + +Draws from the posterior distribution are printed out next, each line +containing a single draw with the columns corresponding to the header. + +``` +-6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853 +-6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295 +-7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299 +-6.88712,1,0.884484,1,1,0,7.02101,0.188229 +-7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596 +... +``` + +The output ends with timing details: +``` +# Elapsed Time: 0.007 seconds (Warm-up) +# 0.017 seconds (Sampling) +# 0.024 seconds (Total) ``` ## Iterations @@ -311,16 +486,24 @@ specifies the location of the auxiliary output file which contains sampler infor and the gradients on the unconstrained scale and log probabilities for all parameters in the model. By default, no auxiliary output file is produced. -## Multiple chains in one executable {#sampler-num-chains} +## Running multiple chains {#multi-chain-sampling} + +A Markov chain generates samples from the target distribution only after it has converged to equilibrium. +In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. +In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. +One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior +to other randomly initialized chains. +For robust diagnostics, we recommend running 4 chains. + +The preferred way of using multiple chains is to run them all from the same executable using +the `num_chains` argument. There is also the option to use the [Unix or DOS shell to run multiple executables](#old-multichain). -As described in the -[quickstart section on parallelism](mcmc_sampling_intro.qmd#multi-chain-sampling), -the preferred way to run multiple chains is to use the `num_chains` argument. +### Using the num_chains argument to run multiple chains {#sampler-num-chains} +The `num_chains` argument can be used for all of Stan's samplers with the exception of the `static` HMC engine. This will run multiple chains of MCMC from the same executable, which can save -on memory usage due to only needing one copy of the model and data. As noted in -the quickstart guide, this will be done in parallel if the model was compiled -with `STAN_THREADS=true`. +on memory usage due to only needing one copy of the model and data. Depending on whether the +model was compiled with `STAN_THREADS=true`, these will either run in parallel or one after the other. The `num_chains` argument changes the meanings of several other arguments when it is greater than `1` (the default). Many arguments are now interpreted as a @@ -341,40 +524,143 @@ for each chain. The numbers in these filenames are also based on the `id` argument, which defaults to `1`. -## Examples - older parallelism +For example, this will run 4 chains: +``` +./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv +``` +This will produce samples in `output_1.csv`, `output_2.csv`, `output_3.csv`, `output_4.csv`. +A suffix with the chain id +is appended to the provided output filename (`output.csv` in the above command). + + +If the model was not compiled with `STAN_THREADS=true`, the above command will run 4 chains sequentially. + +If the model was compiled with `STAN_THREADS=true`, the chains can run in parallel, with the `num_threads` argument +defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization +(`map_rect` or `reduce_sum` calls), the below command will run 4 chains in parallel, provided there are cores +available: +``` +./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4 +``` + +If the model uses within-chain parallelization (`map_rect` or `reduce_sum` calls), the threads are automatically +scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below +call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads, +1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the [Threading +Building Blocks scheduler](https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-11/how-task-scheduler-works.html). + +``` +./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16 +``` + + +## Summarizing sampler output(s) with `stansummary` + +The [`stansummary` utility](stansummary.qmd) processes one or more output files from a run +or set of runs of Stan's HMC sampler given a model and data. +For all columns in the Stan CSV output file `stansummary` reports a set of statistics +including mean, standard deviation, percentiles, effective number of samples, and $\hat{R}$ values. + +To run `stansummary` on the output files generated by the for loop above, +by the above run of the `bernoulli` model on Mac or Linux: +``` +/bin/stansummary output_*.csv +``` + +On Windows, use backslashes to call the `stansummary.exe`. +``` +\bin\stansummary.exe output_*.csv +``` +The stansummary output consists of one row of statistics per column +in the Stan CSV output file. Therefore, the first rows in the +stansummary report statistics over the sampler state. +The final row of output summarizes the estimates of the model variable `theta`: +``` +Inference for Stan model: bernoulli_model +4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved. + +Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total +Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total + + Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat +lp__ -7.3 1.8e-02 0.75 -8.8 -7.0 -6.8 1.8e+03 2.4e+04 1.0e+00 +accept_stat__ 0.89 2.7e-03 0.17 0.52 0.96 1.0 3.9e+03 5.1e+04 1.0e+00 +stepsize__ 1.1 7.5e-02 0.11 0.93 1.2 1.2 2.0e+00 2.6e+01 2.5e+13 +treedepth__ 1.4 8.1e-03 0.49 1.0 1.0 2.0 3.6e+03 4.7e+04 1.0e+00 +n_leapfrog__ 2.3 1.7e-02 0.98 1.0 3.0 3.0 3.3e+03 4.3e+04 1.0e+00 +divergent__ 0.00 nan 0.00 0.00 0.00 0.00 nan nan nan +energy__ 7.8 2.6e-02 1.0 6.8 7.5 9.9 1.7e+03 2.2e+04 1.0e+00 +theta 0.25 2.9e-03 0.12 0.079 0.23 0.46 1.7e+03 2.1e+04 1.0e+00 + +Samples were drawn using hmc with nuts. +For each parameter, N_Eff is a crude measure of effective sample size, +and R_hat is the potential scale reduction factor on split chains (at +convergence, R_hat=1). +``` + +In this example, we conditioned the model on a dataset consisting of the outcomes of +10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% +percentile values for `theta` reflect the uncertainty in our estimate, due to the +small amount of data, given the prior of `beta(1, 1)` + + +## Examples - older parallelism {#old-multichain} **Note**: Many of these examples can be simplified by using the [`num_chains` argument](#sampler-num-chains). -The Quickstart Guide [MCMC Sampling chapter](mcmc_sampling_intro.qmd) section on multiple chains -also showed how to run multiple chains given a model and data, using the minimal required -command line options: the method, the name of the data file, and a chain-specific name for the output file. +When the `num_chains` argument is not available or is undesirable for whatever reason, +built-in tools in the system shell can be used. + +To run multiple chains given a model and data, either sequentially or in parallel, +we can also use the Unix or DOS shell `for` loop to set up index variables needed to identify +each chain and its outputs. -This creates multiple copies of the model process which will all load the data. +On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters +is: +``` +for NAME [in LIST]; do COMMANDS; done +``` +The list can be a simple sequence of numbers, or you can use the shell expansion syntax `{1..N}` +which expands to the sequence from $1$ to $N$, e.g. `{1..4}` expands to `1 2 3 4`. +Note that the expression `{1..N}` cannot contain spaces. -To run 4 chains in parallel on Mac OS and Linux, the syntax in both bash and zsh is the same: +To run 4 chains for the example bernoulli model on MacOS or Linux: ``` > for i in {1..4} do - ./bernoulli sample data file=my_model.data.json \ - output file=output_${i}.csv & + ./bernoulli sample data file=bernoulli.data.json \ + output file=output_${i}.csv done ``` The backslash (`\`) indicates a line continuation in Unix. The expression `${i}` substitutes in the value of loop index variable `i`. -The ampersand (`&`) pushes each process into the background which allows the loop to continue +To run chains in parallel, put an ampersand (`&`) at the end of the nested sampler command: +``` +> for i in {1..4} + do + ./bernoulli sample data file=bernoulli.data.json \ + output file=output_${i}.csv & + done +``` +This pushes each process into the background which allows the loop to continue without waiting for the current chain to finish. -On Windows the corresponding loop is: +On Windows, the DOS [for-loop syntax](https://www.windows-commandline.com/windows-for-loop-examples/) is one of: +``` +for %i in (SET) do COMMAND COMMAND-ARGUMENTS +for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS +``` +To run 4 chains in parallel on Windows: ``` >for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^ - data file=my_model.data.json my_data ^ + data file=bernoulli.data.json my_data ^ output file=output_%i.csv ``` The caret (`^`) indicates a line continuation in DOS. The expression `%i` is the loop index. -In the following examples, we focus on just the nested sampler command for Unix. +In the following extended examples, we focus on just the nested sampler command for Unix. ### Running multiple chains with a specified RNG seed diff --git a/src/cmdstan-guide/mcmc_sampling_intro.qmd b/src/cmdstan-guide/mcmc_sampling_intro.qmd deleted file mode 100644 index 63aec5972..000000000 --- a/src/cmdstan-guide/mcmc_sampling_intro.qmd +++ /dev/null @@ -1,321 +0,0 @@ ---- -pagetitle: MCMC Sampling ---- - -# MCMC Sampling {#mcmc-intro} - -## Running the sampler - -To generate a sample from the posterior distribution of -the model conditioned on the data, -we run the executable program with the argument `sample` or `method=sample` -together with the input data. -The executable can be run from any directory. -Here, we run it in the directory which contains the Stan program and input data, -`/examples/bernoulli`: -``` -> cd examples/bernoulli -``` - -To execute sampling of the model under Linux or Mac, use: -``` -> ./bernoulli sample data file=bernoulli.data.json -``` - -In Windows, the `./` prefix is not needed: - -``` -> bernoulli.exe sample data file=bernoulli.data.json -``` - -The output is the same across all supported platforms. First, the -configuration of the program is echoed to the standard output: - -``` -method = sample (Default) - sample - num_samples = 1000 (Default) - num_warmup = 1000 (Default) - save_warmup = 0 (Default) - thin = 1 (Default) - adapt - engaged = 1 (Default) - gamma = 0.050000000000000003 (Default) - delta = 0.80000000000000004 (Default) - kappa = 0.75 (Default) - t0 = 10 (Default) - init_buffer = 75 (Default) - term_buffer = 50 (Default) - window = 25 (Default) - save_metric = 0 (Default) - algorithm = hmc (Default) - hmc - engine = nuts (Default) - nuts - max_depth = 10 (Default) - metric = diag_e (Default) - metric_file = (Default) - stepsize = 1 (Default) - stepsize_jitter = 0 (Default) - num_chains = 1 (Default) -id = 0 (Default) -data - file = bernoulli.data.json -init = 2 (Default) -random - seed = 3252652196 (Default) -output - file = output.csv (Default) - diagnostic_file = (Default) - refresh = 100 (Default) -``` - -After the configuration has been displayed, a short timing message is -given. - -``` -Gradient evaluation took 1.2e-05 seconds -1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. -Adjust your expectations accordingly! -``` - -Next, the sampler reports the iteration number, reporting the -percentage complete. - -``` -Iteration: 1 / 2000 [ 0%] (Warmup) -.... -Iteration: 2000 / 2000 [100%] (Sampling) -``` - -Finally, the sampler reports timing information: -``` - Elapsed Time: 0.007 seconds (Warm-up) - 0.017 seconds (Sampling) - 0.024 seconds (Total) -``` - -## Running multiple chains {#multi-chain-sampling} - -A Markov chain generates samples from the target distribution only after it has converged to equilibrium. -In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. -In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. -One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior -to other randomly initialized chains. -For robust diagnostics, we recommend running 4 chains. - -There are two different ways of running multiple chains, with the `num_chains` argument using a single executable -and by using the Unix and DOS shell to run multiple executables. - -### Using the num_chains argument to run multiple chains - -The `num_chains` argument can be used for all of Stan's samplers with the exception of the `static` HMC engine. - -Example that will run 4 chains: -``` -./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv -``` - -If the model was not compiled with `STAN_THREADS=true`, the above command will run 4 chains sequentially and will -produce the sample in `output_1.csv`, `output_2.csv`, `output_3.csv`, `output_4.csv`. A suffix with the chain id -is appended to the provided output filename (`output.csv` in the above command). - -If the model was compiled with `STAN_THREADS=true`, the chains can run in parallel, with the `num_threads` argument -defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization -(`map_rect` or `reduce_sum` calls), the below command will run 4 chains in parallel, provided there are cores -available: -``` -./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4 -``` - -If the model uses within-chain parallelization (`map_rect` or `reduce_sum` calls), the threads are automatically -scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below -call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads, -1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the [Threading -Building Blocks scheduler](https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-11/how-task-scheduler-works.html). - -``` -./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16 -``` - -### Using shell for running multiple chains - -To run multiple chains given a model and data, either sequentially or in parallel, -we can also use the Unix or DOS shell `for` loop to set up index variables needed to identify -each chain and its outputs. - -On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters -is: -``` -for NAME [in LIST]; do COMMANDS; done -``` -The list can be a simple sequence of numbers, or you can use the shell expansion syntax `{1..N}` -which expands to the sequence from $1$ to $N$, e.g. `{1..4}` expands to `1 2 3 4`. -Note that the expression `{1..N}` cannot contain spaces. - -To run 4 chains for the example bernoulli model on MacOS or Linux: -``` -> for i in {1..4} - do - ./bernoulli sample data file=bernoulli.data.json \ - output file=output_${i}.csv - done -``` -The backslash (`\`) indicates a line continuation in Unix. -The expression `${i}` substitutes in the value of loop index variable `i`. -To run chains in parallel, put an ampersand (`&`) at the end of the nested sampler command: -``` -> for i in {1..4} - do - ./bernoulli sample data file=bernoulli.data.json \ - output file=output_${i}.csv & - done -``` -This pushes each process into the background which allows the loop to continue -without waiting for the current chain to finish. - -On Windows, the DOS [for-loop syntax](https://www.windows-commandline.com/windows-for-loop-examples/) is one of: -``` -for %i in (SET) do COMMAND COMMAND-ARGUMENTS -for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS -``` -To run 4 chains in parallel on Windows: -``` ->for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^ - data file=bernoulli.data.json my_data ^ - output file=output_%i.csv -``` -The caret (`^`) indicates a line continuation in DOS. - -## Stan CSV output file {#mcmc_output_csv} - -Each execution of the model results in draws from a single Markov -chain being written to a file in comma-separated value (CSV) format. -The default name of the output file is `output.csv`. - -The first part of the output file records the version of the -underlying Stan library and the configuration as comments (i.e., lines -beginning with the pound sign (`#`)). - -``` -# stan_version_major = 2 -# stan_version_minor = 23 -# stan_version_patch = 0 -# model = bernoulli_model -# method = sample (Default) -# sample -# num_samples = 1000 (Default) -# num_warmup = 1000 (Default) -... -# output -# file = output.csv (Default) -# diagnostic_file = (Default) -# refresh = 100 (Default) -``` -This is followed by a CSV header indicating the names of the values -sampled. -``` -lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta -``` -The first output columns report the HMC sampler information: - -- `lp__` - the total log probability density (up to an additive constant) at each sample -- `accept_stat__ ` - the average Metropolis acceptance probability over each simulated Hamiltonian trajectory -- `stepsize__ ` - integrator step size -- `treedepth__ ` - depth of tree used by NUTS (NUTS sampler) -- `n_leapfrog__ ` - number of leapfrog calculations (NUTS sampler) -- `divergent__ ` - has value `1` if trajectory diverged, otherwise `0`. (NUTS sampler) -- `energy__ ` - value of the Hamiltonian -- `int_time__ ` - total integration time (static HMC sampler) - -Because the above header is from the NUTS sampler, it has columns `treedepth__`, `n_leapfrog__`, and `divergent__` -and doesn't have column `int_time__`. -The remaining columns correspond to model parameters. For the -Bernoulli model, it is just the final column, `theta`. - -The header line is written to the output file before warmup begins. -If option `save_warmup` is set to `1`, the warmup draws are output directly after the header. -The total number of warmup draws saved is `num_warmup` divided by `thin`, rounded up (i.e., `ceiling`). - -Following the warmup draws (if any), are comments which record the results of adaptation: -the stepsize, and inverse mass metric used during sampling: - -``` -# Adaptation terminated -# Step size = 0.884484 -# Diagonal elements of inverse mass matrix: -# 0.535006 -``` - -The default sampler is NUTS with an adapted step size and a diagonal -inverse mass matrix. For this example, the step size is 0.884484, and -the inverse mass contains the single entry 0.535006 corresponding to -the parameter `theta`. - -Draws from the posterior distribution are printed out next, each line -containing a single draw with the columns corresponding to the header. - -``` --6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853 --6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295 --7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299 --6.88712,1,0.884484,1,1,0,7.02101,0.188229 --7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596 -... -``` - -The output ends with timing details: -``` -# Elapsed Time: 0.007 seconds (Warm-up) -# 0.017 seconds (Sampling) -# 0.024 seconds (Total) -``` - -## Summarizing sampler output(s) with `stansummary` - -The `stansummary` utility processes one or more output files from a run -or set of runs of Stan's HMC sampler given a model and data. -For all columns in the Stan CSV output file `stansummary` reports a set of statistics -including mean, standard deviation, percentiles, effective number of samples, and $\hat{R}$ values. - -To run `stansummary` on the output files generated by the for loop above, -by the above run of the `bernoulli` model on Mac or Linux: -``` -/bin/stansummary output_*.csv -``` - -On Windows, use backslashes to call the `stansummary.exe`. -``` -\bin\stansummary.exe output_*.csv -``` -The stansummary output consists of one row of statistics per column -in the Stan CSV output file. Therefore, the first rows in the -stansummary report statistics over the sampler state. -The final row of output summarizes the estimates of the model variable `theta`: -``` -Inference for Stan model: bernoulli_model -4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved. - -Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total -Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total - - Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat -lp__ -7.3 1.8e-02 0.75 -8.8 -7.0 -6.8 1.8e+03 2.4e+04 1.0e+00 -accept_stat__ 0.89 2.7e-03 0.17 0.52 0.96 1.0 3.9e+03 5.1e+04 1.0e+00 -stepsize__ 1.1 7.5e-02 0.11 0.93 1.2 1.2 2.0e+00 2.6e+01 2.5e+13 -treedepth__ 1.4 8.1e-03 0.49 1.0 1.0 2.0 3.6e+03 4.7e+04 1.0e+00 -n_leapfrog__ 2.3 1.7e-02 0.98 1.0 3.0 3.0 3.3e+03 4.3e+04 1.0e+00 -divergent__ 0.00 nan 0.00 0.00 0.00 0.00 nan nan nan -energy__ 7.8 2.6e-02 1.0 6.8 7.5 9.9 1.7e+03 2.2e+04 1.0e+00 -theta 0.25 2.9e-03 0.12 0.079 0.23 0.46 1.7e+03 2.1e+04 1.0e+00 - -Samples were drawn using hmc with nuts. -For each parameter, N_Eff is a crude measure of effective sample size, -and R_hat is the potential scale reduction factor on split chains (at -convergence, R_hat=1). -``` - -In this example, we conditioned the model on a dataset consisting of the outcomes of -10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% -percentile values for `theta` reflect the uncertainty in our estimate, due to the -small amount of data, given the prior of `beta(1, 1)` diff --git a/src/cmdstan-guide/stan_csv_apdx.qmd b/src/cmdstan-guide/stan_csv_apdx.qmd index 0dae8c214..73a534fc2 100644 --- a/src/cmdstan-guide/stan_csv_apdx.qmd +++ b/src/cmdstan-guide/stan_csv_apdx.qmd @@ -160,7 +160,7 @@ will also be included in this initial comment header. **Column headers** The CSV header row lists all sampler parameters, model parameters, transformed parameters, and quantities of interest. -The sampler parameters are described in detail in the [output file](mcmc_sampling_intro.qmd#mcmc_output_csv) section of the Quickstart Guide chapter on MCMC Sampling. +The sampler parameters are described in detail in the [output file](mcmc_config.qmd#mcmc_output_csv) section of the Quickstart Guide chapter on MCMC Sampling. The example model `bernoulli.stan` only contains one parameter `theta`, therefore the CSV file data table consists of 7 sampler parameter columns and one column for the model parameter: ``` From faef9261e0669187c6ef2c27c2f141628cf312a1 Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 4 Apr 2024 12:56:17 -0400 Subject: [PATCH 2/6] Merge optimization --- redirects.txt | 1 + src/_quarto.yml | 7 +- src/cmdstan-guide/_quarto.yml | 7 +- src/cmdstan-guide/optimization_intro.qmd | 132 ------------------- src/cmdstan-guide/optimize_config.qmd | 153 ++++++++++++++++++++--- src/cmdstan-guide/stan_csv_apdx.qmd | 2 +- 6 files changed, 144 insertions(+), 158 deletions(-) diff --git a/redirects.txt b/redirects.txt index 2aff4915f..5eeb9db2d 100644 --- a/redirects.txt +++ b/redirects.txt @@ -592,3 +592,4 @@ https://mc-stan.org/docs/cmdstan-guide/command-line-interface-overview.html http https://mc-stan.org/docs/functions-reference/matrix-arithmetic-operators.html https://mc-stan.org/docs/functions-reference/matrix_operations.html#matrix-arithmetic-operators https://mc-stan.org/docs/cmdstan-guide/mcmc_sampling_intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html +https://mc-stan.org/docs/cmdstan-guide/optimization_intro.html https://mc-stan.org/docs/cmdstan-guide/optimize_config.html diff --git a/src/_quarto.yml b/src/_quarto.yml index 83d5dae98..12b47d4d4 100644 --- a/src/_quarto.yml +++ b/src/_quarto.yml @@ -236,16 +236,16 @@ website: contents: - cmdstan-guide/index.qmd - section: "Version {{< env STAN_DOCS_VERSION >}}" - - section: "Quickstart Guide" + - section: "Getting Started" contents: - cmdstan-guide/installation.qmd - cmdstan-guide/example_model_data.qmd - cmdstan-guide/compiling_stan_programs.qmd - - cmdstan-guide/optimization_intro.qmd + - cmdstan-guide/parallelization.qmd - cmdstan-guide/pathfinder_intro.qmd - cmdstan-guide/variational_intro.qmd - cmdstan-guide/generate_quantities_intro.qmd - - section: "Reference Manual" + - section: "Running CmdStan" contents: - cmdstan-guide/command_line_options.qmd - cmdstan-guide/mcmc_config.qmd @@ -256,7 +256,6 @@ website: - cmdstan-guide/laplace_sample_config.qmd - cmdstan-guide/log_prob_config.qmd - cmdstan-guide/diagnose_config.qmd - - cmdstan-guide/parallelization.qmd - section: "Tools and Utilities" contents: - cmdstan-guide/stanc.qmd diff --git a/src/cmdstan-guide/_quarto.yml b/src/cmdstan-guide/_quarto.yml index 243b37596..2c1aa79e3 100644 --- a/src/cmdstan-guide/_quarto.yml +++ b/src/cmdstan-guide/_quarto.yml @@ -34,17 +34,17 @@ book: chapters: - index.qmd - - part: "Quickstart Guide" + - part: "Getting Started" chapters: - installation.qmd - example_model_data.qmd - compiling_stan_programs.qmd - - optimization_intro.qmd + - parallelization.qmd - pathfinder_intro.qmd - variational_intro.qmd - generate_quantities_intro.qmd - - part: "Reference Manual" + - part: "Running CmdStan" chapters: - command_line_options.qmd - mcmc_config.qmd @@ -55,7 +55,6 @@ book: - laplace_sample_config.qmd - log_prob_config.qmd - diagnose_config.qmd - - parallelization.qmd - part: "CmdStan Utilities" chapters: diff --git a/src/cmdstan-guide/optimization_intro.qmd b/src/cmdstan-guide/optimization_intro.qmd index 18fb0f823..e69de29bb 100644 --- a/src/cmdstan-guide/optimization_intro.qmd +++ b/src/cmdstan-guide/optimization_intro.qmd @@ -1,132 +0,0 @@ ---- -pagetitle: Optimization ---- - -# Optimization - -The CmdStan executable can run Stan’s optimization algorithms -which provide a deterministic method to find the posterior mode. -If the posterior is not convex, there is no guarantee Stan -will be able to find the global mode as opposed to a local optimum -of log probability. - -The executable does not need to be -recompiled in order to switch from sampling to optimization, and the -data input format is the same. -The following is a minimal call to -Stan's optimizer using defaults for everything but the location of the -data file. - -``` -> ./bernoulli optimize data file=bernoulli.data.json -``` - -Executing this command prints both output to the console and -to a csv file. - -The first part of the console output reports on the configuration used. -The above command uses all default configurations, therefore the optimizer -used is the L-BFGS optimizer and its default initial stepsize and -tolerances for monitoring convergence: -``` - ./bernoulli optimize data file=bernoulli.data.json -method = optimize - optimize - algorithm = lbfgs (Default) - lbfgs - init_alpha = 0.001 (Default) - tol_obj = 1e-12 (Default) - tol_rel_obj = 10000 (Default) - tol_grad = 1e-08 (Default) - tol_rel_grad = 10000000 (Default) - tol_param = 1e-08 (Default) - history_size = 5 (Default) - iter = 2000 (Default) - save_iterations = 0 (Default) -id = 0 (Default) -data - file = bernoulli.data.json -init = 2 (Default) -random - seed = 87122538 (Default) -output - file = output.csv (Default) - diagnostic_file = (Default) - refresh = 100 (Default) -``` - -The second part of the -output indicates how well the algorithm fared, here converging and -terminating normally. The numbers reported indicate that it took 5 -iterations and 8 gradient evaluations. -This is, not surprisingly, far fewer iterations than required -for sampling; even fewer iterations would be used with less stringent -user-specified convergence tolerances. -The `alpha` value is for step size used. -In the final state the change in parameters was roughly $0.002$ -and the length of the gradient roughly 3e-05 ($0.00003$). - -``` -Initial log joint probability = -6.85653 - Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes - 5 -5.00402 0.00184936 3.35074e-05 1 1 8 -Optimization terminated normally: - Convergence detected: relative gradient magnitude is below tolerance -``` - -The output from optimization is written into the file -`output.csv` by default. The output follows the same pattern as -the output for sampling, first dumping the entire set of parameters -used as comment lines: - -``` -# stan_version_major = 2 -# stan_version_minor = 23 -# stan_version_patch = 0 -# model = bernoulli_model -# method = optimize -# optimize -# algorithm = lbfgs (Default) -... -``` - -Following the config information, are two lines of output: -the CSV headers and the recorded values: - -``` -lp__,theta --5.00402,0.200003 -``` - -Note that everything is a comment other than a line for the header, -and a line for the values. Here, the header indicates the unnormalized -log probability with `lp__` and the model parameter -`theta`. The maximum log probability is -5.0 and the posterior -mode for `theta` is 0.20. The mode exactly matches what we would -expect from the data. -Because the prior was uniform, the result 0.20 represents the maximum -likelihood estimate (MLE) for the very simple Bernoulli model. Note -that no uncertainty is reported. - -All of the optimizers stream per-iteration intermediate approximations to the command line console. -The sub-argument `save_iterations` specifies whether or not to save -the intermediate iterations to the output file. -Allowed values are $0$ or $1$, corresponding to `False` and `True` respectively. -The default value is $0$, i.e., intermediate iterations are not saved to the output file. -Running the optimizer with `save_iterations=1` writes both -the initial log joint probability and values for all iterations to the output CSV file. - -Running the example model with option `save_iterations=1`, i.e., the command -``` -> ./bernoulli optimize save_iterations=1 data file=bernoulli.data.json -``` -produces CSV file output rows: -``` -lp__,theta --6.85653,0.493689 --6.10128,0.420936 --5.02953,0.22956 --5.00517,0.206107 --5.00403,0.200299 --5.00402,0.200003 -``` diff --git a/src/cmdstan-guide/optimize_config.qmd b/src/cmdstan-guide/optimize_config.qmd index 8f81a29cf..58a156dae 100644 --- a/src/cmdstan-guide/optimize_config.qmd +++ b/src/cmdstan-guide/optimize_config.qmd @@ -1,23 +1,94 @@ --- -pagetitle: Optimization Configuration +pagetitle: Optimization --- -# Maximum Likelihood Estimation - -The `optimize` method finds the mode of the posterior distribution, assuming that there is one. -If the posterior is not convex, there is no guarantee Stan will be able to find the global mode -as opposed to a local optimum of log probability. -For optimization, the mode is calculated without the Jacobian adjustment -for constrained variables, which shifts the mode due to the change of variables. -Thus modes correspond to modes of the model as written. - -The full set of configuration options available for the `optimize` method is -reported at the beginning of the sampler output file as CSV comments. -When the example model `bernoulli.stan` is run with `method=optimize` -via the command line with all default arguments, -the resulting Stan CSV file header comments show the complete set -of default configuration options: +# Optimization + +The CmdStan executable can run Stan’s optimization algorithms +which provide a deterministic method to find the posterior mode. +If the posterior is not convex, there is no guarantee Stan +will be able to find the global mode as opposed to a local optimum +of log probability. + + +The full set of configuration options available for the `optimize` method +is available by using the `optimize help-all` subcommand. The arguments +with their requested values or defaults are also reported at the beginning +of the optimizer console output and in the output CSV file's comments. + +The executable does not need to be +recompiled in order to switch from sampling to optimization, and the +data input format is the same. +The following is a minimal call to +Stan's optimizer using defaults for everything but the location of the +data file. + +``` +> ./bernoulli optimize data file=bernoulli.data.json +``` + +Executing this command prints both output to the console and +to a CSV file. + +The first part of the console output reports on the configuration used. +The above command uses all default configurations, therefore the optimizer +used is the L-BFGS optimizer and its default initial stepsize and +tolerances for monitoring convergence: +``` + ./bernoulli optimize data file=bernoulli.data.json +method = optimize + optimize + algorithm = lbfgs (Default) + lbfgs + init_alpha = 0.001 (Default) + tol_obj = 1e-12 (Default) + tol_rel_obj = 10000 (Default) + tol_grad = 1e-08 (Default) + tol_rel_grad = 10000000 (Default) + tol_param = 1e-08 (Default) + history_size = 5 (Default) + iter = 2000 (Default) + save_iterations = 0 (Default) +id = 0 (Default) +data + file = bernoulli.data.json +init = 2 (Default) +random + seed = 87122538 (Default) +output + file = output.csv (Default) + diagnostic_file = (Default) + refresh = 100 (Default) ``` + +The second part of the +output indicates how well the algorithm fared, here converging and +terminating normally. The numbers reported indicate that it took 5 +iterations and 8 gradient evaluations. +This is, not surprisingly, far fewer iterations than required +for sampling; even fewer iterations would be used with less stringent +user-specified convergence tolerances. +The `alpha` value is for step size used. +In the final state the change in parameters was roughly $0.002$ +and the length of the gradient roughly 3e-05 ($0.00003$). + +``` +Initial log joint probability = -6.85653 + Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes + 5 -5.00402 0.00184936 3.35074e-05 1 1 8 +Optimization terminated normally: + Convergence detected: relative gradient magnitude is below tolerance +``` + +The output from optimization is written into the file +`output.csv` by default. The output follows the [same pattern](stan_csv_apdx.qmd) +as the output for sampling, first dumping the entire set of parameters +used as comment lines: + +``` +# stan_version_major = 2 +# stan_version_minor = 23 +# stan_version_patch = 0 # model = bernoulli_model # method = optimize # optimize @@ -33,21 +104,69 @@ of default configuration options: # jacobian = 0 (Default) # iter = 2000 (Default) # save_iterations = 0 (Default) +... +``` + +Following the config information, are two lines of output: +the CSV headers and the recorded values: + +``` +lp__,theta +-5.00402,0.200003 ``` +Note that everything is a comment other than a line for the header, +and a line for the values. Here, the header indicates the unnormalized +log probability with `lp__` and the model parameter +`theta`. The maximum log probability is -5.0 and the posterior +mode for `theta` is 0.20. The mode exactly matches what we would +expect from the data. +Because the prior was uniform, the result 0.20 represents the maximum +likelihood estimate (MLE) for the very simple Bernoulli model. Note +that no uncertainty is reported. + +All of the optimizers stream per-iteration intermediate approximations to the command line console. +The sub-argument `save_iterations` specifies whether or not to save +the intermediate iterations to the output file. +Allowed values are $0$ or $1$, corresponding to `False` and `True` respectively. +The default value is $0$, i.e., intermediate iterations are not saved to the output file. +Running the optimizer with `save_iterations=1` writes both +the initial log joint probability and values for all iterations to the output CSV file. + +Running the example model with option `save_iterations=1`, i.e., the command +``` +> ./bernoulli optimize save_iterations=1 data file=bernoulli.data.json +``` +produces CSV file output rows: +``` +lp__,theta +-6.85653,0.493689 +-6.10128,0.420936 +-5.02953,0.22956 +-5.00517,0.206107 +-5.00403,0.200299 +-5.00402,0.200003 +``` + + ## Jacobian adjustments The `jacobian` argument specifies whether or not the call to the model's log probability function should include -the log absolute Jacobian determinant of inverse parameter transforms. +the log absolute Jacobian determinant of +[inverse parameter transforms](https://mc-stan.org/docs/reference-manual/transforms.html). + Without the Jacobian adjustment, optimization returns the (regularized) maximum likelihood estimate (MLE), $\mathrm{argmax}_{\theta}\ p(y | \theta)$, the value which maximizes the likelihood of the data given the parameters, (including prior terms). + Applying the Jacobian adjustment produces the maximum a posteriori estimate (MAP), the maximum value of the posterior distribution, $\mathrm{argmax}_{\theta}\ p(y | \theta)\,p(\theta)$. + + By default this value is `0` (false), do not include the Jacobian adjustment. diff --git a/src/cmdstan-guide/stan_csv_apdx.qmd b/src/cmdstan-guide/stan_csv_apdx.qmd index 73a534fc2..7a686ea59 100644 --- a/src/cmdstan-guide/stan_csv_apdx.qmd +++ b/src/cmdstan-guide/stan_csv_apdx.qmd @@ -160,7 +160,7 @@ will also be included in this initial comment header. **Column headers** The CSV header row lists all sampler parameters, model parameters, transformed parameters, and quantities of interest. -The sampler parameters are described in detail in the [output file](mcmc_config.qmd#mcmc_output_csv) section of the Quickstart Guide chapter on MCMC Sampling. +The sampler parameters are described in detail in the [output file](mcmc_config.qmd#mcmc_output_csv) section of the chapter on MCMC Sampling. The example model `bernoulli.stan` only contains one parameter `theta`, therefore the CSV file data table consists of 7 sampler parameter columns and one column for the model parameter: ``` From bfe357e2a32780532d1f664edce84f006c796f1d Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 4 Apr 2024 14:02:41 -0400 Subject: [PATCH 3/6] Merge Pathfinder --- redirects.txt | 3 +- src/_quarto.yml | 1 - src/cmdstan-guide/_quarto.yml | 1 - src/cmdstan-guide/optimization_intro.qmd | 0 src/cmdstan-guide/pathfinder_config.qmd | 124 +++++++++++++++++++--- src/cmdstan-guide/pathfinder_intro.qmd | 128 ----------------------- 6 files changed, 110 insertions(+), 147 deletions(-) delete mode 100644 src/cmdstan-guide/optimization_intro.qmd delete mode 100644 src/cmdstan-guide/pathfinder_intro.qmd diff --git a/redirects.txt b/redirects.txt index 5eeb9db2d..b102312c5 100644 --- a/redirects.txt +++ b/redirects.txt @@ -529,7 +529,6 @@ https://mc-stan.org/docs/functions-reference/unbounded-discrete-distributions.ht https://mc-stan.org/docs/stan-users-guide/functions-programming.html https://mc-stan.org/docs/stan-users-guide/user-functions.html https://mc-stan.org/docs/functions-reference/removed-functions.html https://mc-stan.org/docs/functions-reference/removed_functions.html https://mc-stan.org/docs/stan-users-guide/custom-probability-functions.html https://mc-stan.org/docs/stan-users-guide/custom-probability.html -https://mc-stan.org/docs/cmdstan-guide/pathfinder-intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_intro.html https://mc-stan.org/docs/functions-reference/bounded-continuous-distributions.html https://mc-stan.org/docs/functions-reference/bounded_continuous_distributions.html https://mc-stan.org/docs/stan-users-guide/mixture-modeling.html https://mc-stan.org/docs/stan-users-guide/finite-mixtures.html https://mc-stan.org/docs/functions-reference/multivariate-discrete-distributions.html https://mc-stan.org/docs/functions-reference/multivariate_discrete_distributions.html @@ -593,3 +592,5 @@ https://mc-stan.org/docs/functions-reference/matrix-arithmetic-operators.html ht https://mc-stan.org/docs/cmdstan-guide/mcmc_sampling_intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html https://mc-stan.org/docs/cmdstan-guide/optimization_intro.html https://mc-stan.org/docs/cmdstan-guide/optimize_config.html +https://mc-stan.org/docs/cmdstan-guide/pathfinder_intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html +https://mc-stan.org/docs/cmdstan-guide/pathfinder-intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html diff --git a/src/_quarto.yml b/src/_quarto.yml index 12b47d4d4..0e5639a23 100644 --- a/src/_quarto.yml +++ b/src/_quarto.yml @@ -242,7 +242,6 @@ website: - cmdstan-guide/example_model_data.qmd - cmdstan-guide/compiling_stan_programs.qmd - cmdstan-guide/parallelization.qmd - - cmdstan-guide/pathfinder_intro.qmd - cmdstan-guide/variational_intro.qmd - cmdstan-guide/generate_quantities_intro.qmd - section: "Running CmdStan" diff --git a/src/cmdstan-guide/_quarto.yml b/src/cmdstan-guide/_quarto.yml index 2c1aa79e3..bf1d36795 100644 --- a/src/cmdstan-guide/_quarto.yml +++ b/src/cmdstan-guide/_quarto.yml @@ -40,7 +40,6 @@ book: - example_model_data.qmd - compiling_stan_programs.qmd - parallelization.qmd - - pathfinder_intro.qmd - variational_intro.qmd - generate_quantities_intro.qmd diff --git a/src/cmdstan-guide/optimization_intro.qmd b/src/cmdstan-guide/optimization_intro.qmd deleted file mode 100644 index e69de29bb..000000000 diff --git a/src/cmdstan-guide/pathfinder_config.qmd b/src/cmdstan-guide/pathfinder_config.qmd index 2bc41efb8..e15921d1c 100644 --- a/src/cmdstan-guide/pathfinder_config.qmd +++ b/src/cmdstan-guide/pathfinder_config.qmd @@ -1,22 +1,39 @@ --- -pagetitle: Pathfinder Configuration +pagetitle: Pathfinder Method for Approximate Bayesian Inference --- # Pathfinder Method for Approximate Bayesian Inference {#pathfinder-config} -The Pathfinder algorithm is described in section [Pathfinder overview](pathfinder_intro.qmd). +The CmdStan method `pathfinder` uses the Pathfinder algorithm of +@zhang_pathfinder:2022, which is further described in the +[Stan Reference Manual](https://mc-stan.org/docs/reference-manual/pathfinder.html). + +A single run of the Pathfinder algorithm generates a set of +approximate draws. Inference is improved by running multiple +Pathfinder instances and using Pareto-smoothed importance resampling +(PSIS) of the resulting sets of draws. This better matches non-normal +target densities and also eliminates minor modes. The `pathfinder` method runs multi-path Pathfinder by default, which returns a PSIS sample over the draws from several individual ("single-path") Pathfinder runs. Argument `num_paths` specifies the number of single-path Pathfinders, the default is $4$. If `num_paths` is set to 1, then only one individual Pathfinder is run without the PSIS reweighting of the sample. -The full set of configuration options available for the `pathfinder` method is -reported at the beginning of the pathfinder output file as CSV comments. -When the example model `bernoulli.stan` is run with `method=pathfinder` -via the command line with all default arguments, -the resulting Stan CSV file header comments show the complete set -of default configuration options: +The full set of configuration options available for the `pathfinder` method +is available by using the `pathfinder help-all` subcommand. The arguments +with their requested values or defaults are also reported at the beginning +of the algorithm's console output and in the output CSV file's comments. + +The following is a minimal call the Pathfinder algorithm using +defaults for everything but the location of the data file. + +``` +> ./bernoulli pathfinder data file=bernoulli.data.R +``` + +Executing this command prints both output to the console and csv files. + +The first part of the console output reports on the configuration used. ``` method = pathfinder pathfinder @@ -35,6 +52,43 @@ method = pathfinder max_lbfgs_iters = 1000 (Default) num_draws = 1000 (Default) num_elbo_draws = 25 (Default) +id = 1 (Default) +data + file = examples/bernoulli/bernoulli.data.json +init = 2 (Default) +random + seed = 1995513073 (Default) +output + file = output.csv (Default) + diagnostic_file = (Default) + refresh = 100 (Default) + sig_figs = -1 (Default) + profile_file = profile.csv (Default) +num_threads = 1 (Default) +``` +The rest of the output describes the progression of the algorithm. + +By default, the Pathfinder algorithm runs 4 single-path Pathfinders in +parallel, the uses importance resampling on the set of returned draws +to produce the specified number of draws. +``` +Path [1] :Initial log joint density = -11.543343 +Path [1] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes + 5 -6.748e+00 1.070e-03 1.707e-05 1.000e+00 1.000e+00 126 -6.220e+00 -6.220e+00 +Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126) +Path [2] :Initial log joint density = -7.443345 +Path [2] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes + 5 -6.748e+00 9.936e-05 3.738e-07 1.000e+00 1.000e+00 126 -6.164e+00 -6.164e+00 +Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126) +Path [3] :Initial log joint density = -18.986308 +Path [3] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes + 5 -6.748e+00 2.996e-04 4.018e-06 1.000e+00 1.000e+00 126 -6.201e+00 -6.201e+00 +Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126) +Path [4] :Initial log joint density = -8.304453 +Path [4] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes + 5 -6.748e+00 2.814e-04 2.034e-06 1.000e+00 1.000e+00 126 -6.221e+00 -6.221e+00 +Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126) +Total log probability function evaluations:8404 ``` @@ -76,7 +130,7 @@ Valid values: $\{0, 1\}$. Default is $1$ (True). ## L-BFGS Configuration Arguments `init_alpha` through `history_size` are the full set of arguments to the L-BFGS optimizer -and have the same defaults for [optimization](optimize_config.qmd). +and have the same defaults for [optimization](optimize_config.qmd#the-quasi-newton-optimizers). ## Multi-path Pathfinder CSV files{#pathfinder_csv} @@ -88,16 +142,54 @@ The importance resampled draws are output as a [StanCSV file](stan_csv_apdx.qmd) The CSV files have the following structure: -- The full set of configuration options available for the `pathfinder` method is -reported at the beginning of the sampler output file as CSV comments. +The initial CSV comment rows contain the complete set of CmdStan +configuration options. + +``` +... +# method = pathfinder +# pathfinder +# init_alpha = 0.001 (Default) +# tol_obj = 9.9999999999999998e-13 (Default) +# tol_rel_obj = 10000 (Default) +# tol_grad = 1e-08 (Default) +# tol_rel_grad = 10000000 (Default) +# tol_param = 1e-08 (Default) +# history_size = 5 (Default) +# num_psis_draws = 1000 (Default) +# num_paths = 4 (Default) +# psis_resample = 1 (Default) +# calculate_lp = 1 (Default) +# save_single_paths = 0 (Default) +# max_lbfgs_iters = 1000 (Default) +# num_draws = 1000 (Default) +# num_elbo_draws = 25 (Default) +... +``` -- The CSV header row consists of columns `lp_approx__`, `lp__`, and -the Stan model parameters, transformed parameters, and generated quantities -in the order in which they are declared in the Stan program. +Next is the column header line, followed the +set of approximate draws. The Pathfinder algorithm first outputs +`lp_approx__`, the log density in the approximating distribution, and +`lp__`, the log density in the target distribution, followed by +estimates of the model parameters, transformed parameters, and +generated quantities. +``` +lp_approx__,lp__,theta +-2.4973, -8.2951, 0.0811852 +-0.87445, -7.06526, 0.160207 +-0.812285, -7.07124, 0.35819 +... +``` -- The data rows contain the draws from the single- or multi-path run. +The final lines are comment lines which give timing information. +``` +# Elapsed Time: 0.016000 seconds (Pathfinders) +# 0.003000 seconds (PSIS) +# 0.019000 seconds (Total) +``` -- Final comments containing timing information. +Pathfinder provides option `save_single_paths` which will save output +from the single-path Pathfinder runs. ## Single-path Pathfinder Outputs. diff --git a/src/cmdstan-guide/pathfinder_intro.qmd b/src/cmdstan-guide/pathfinder_intro.qmd deleted file mode 100644 index 2ac38c89a..000000000 --- a/src/cmdstan-guide/pathfinder_intro.qmd +++ /dev/null @@ -1,128 +0,0 @@ ---- -pagetitle: Pathfinder for Variational Inference ---- - -# Variational Inference using Pathfinder {#pathfinder-intro} - -The CmdStan method `pathfinder` uses the Pathfinder algorithm of -@zhang_pathfinder:2022. Pathfinder is a variational method for -approximately sampling from differentiable log densities. Starting -from a random initialization, Pathfinder locates normal approximations -to the target density along a quasi-Newton optimization path, with -local covariance estimated using the negative inverse Hessian estimates -produced by the L-BFGS optimizer. Pathfinder returns draws from the -Gaussian approximation with the lowest estimated Kullback-Leibler (KL) -divergence to the true posterior. - -Pathfinder differs from the ADVI method in that it uses quasi-Newton -optimization on the log posterior instead of stochastic gradient -descent (SGD) on the Monte Carlo computation of the evidence lower -bound (ELBO). Pathfinder's approach is both faster and more stable -than that of ADVI. Compared to ADVI and short dynamic HMC runs, -Pathfinder requires one to two orders of magnitude fewer log density -and gradient evaluations, with greater reductions for more challenging -posteriors. - -A single run of the Pathfinder algorithm generates a set of -approximate draws. Inference is improved by running multiple -Pathfinder instances and using Pareto-smoothed importance resampling -(PSIS) of the resulting sets of draws. This better matches non-normal -target densities and also eliminates minor modes. By default, the -`pathfinder` method uses 4 independent Pathfinder runs, each of which -produces 1000 approximate draws, which are then importance resampled -down to 1000 final draws. - -The following is a minimal call the Pathfinder algorithm using -defaults for everything but the location of the data file. - -``` -> ./bernoulli pathfinder data file=bernoulli.data.R -``` - -Executing this command prints both output to the console and csv files. - -The first part of the console output reports on the configuration used. -``` -method = pathfinder - pathfinder - init_alpha = 0.001 (Default) - tol_obj = 9.9999999999999998e-13 (Default) - tol_rel_obj = 10000 (Default) - tol_grad = 1e-08 (Default) - tol_rel_grad = 10000000 (Default) - tol_param = 1e-08 (Default) - history_size = 5 (Default) - num_psis_draws = 1000 (Default) - num_paths = 4 (Default) - psis_resample = 1 (Default) - calculate_lp = 1 (Default) - save_single_paths = 0 (Default) - max_lbfgs_iters = 1000 (Default) - num_draws = 1000 (Default) - num_elbo_draws = 25 (Default) -id = 1 (Default) -data - file = examples/bernoulli/bernoulli.data.json -init = 2 (Default) -random - seed = 1995513073 (Default) -output - file = output.csv (Default) - diagnostic_file = (Default) - refresh = 100 (Default) - sig_figs = -1 (Default) - profile_file = profile.csv (Default) -num_threads = 1 (Default) -``` -The rest of the output describes the progression of the algorithm. - -By default, the Pathfinder algorithm runs 4 single-path Pathfinders in -parallel, the uses importance resampling on the set of returned draws -to produce the specified number of draws. -``` -Path [1] :Initial log joint density = -11.543343 -Path [1] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes - 5 -6.748e+00 1.070e-03 1.707e-05 1.000e+00 1.000e+00 126 -6.220e+00 -6.220e+00 -Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126) -Path [2] :Initial log joint density = -7.443345 -Path [2] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes - 5 -6.748e+00 9.936e-05 3.738e-07 1.000e+00 1.000e+00 126 -6.164e+00 -6.164e+00 -Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126) -Path [3] :Initial log joint density = -18.986308 -Path [3] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes - 5 -6.748e+00 2.996e-04 4.018e-06 1.000e+00 1.000e+00 126 -6.201e+00 -6.201e+00 -Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126) -Path [4] :Initial log joint density = -8.304453 -Path [4] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes - 5 -6.748e+00 2.814e-04 2.034e-06 1.000e+00 1.000e+00 126 -6.221e+00 -6.221e+00 -Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126) -Total log probability function evaluations:8404 -``` - -Pathfinder outputs a [StanCSV file](pathfinder_config.qmd#pathfinder_csv) file which -contains the importance resampled draws from multi-path Pathfinder. -The initial CSV comment rows contain the complete set of CmdStan -configuration options. Next is the column header line, followed the -set of approximate draws. The Pathfinder algorithm first outputs -`lp_approx__`, the log density in the approximating distribution, and -`lp__`, the log density in the target distribution, followed by -estimates of the model parameters, transformed parameters, and -generated quantities. -``` -lp_approx__,lp__,theta --2.4973, -8.2951, 0.0811852 --0.87445, -7.06526, 0.160207 --0.812285, -7.07124, 0.35819 -... -``` - -The final lines are comment lines which give timing information. -``` -# Elapsed Time: 0.016000 seconds (Pathfinders) -# 0.003000 seconds (PSIS) -# 0.019000 seconds (Total) -``` - -Pathfinder provides option `save_single_paths` which will save output -from the single-path Pathfinder runs. -See section [Pathfinder Method](pathfinder_config.qmd) for details. From 0d6e3ebd358f29b1347fc42663cf00835da35f80 Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 4 Apr 2024 14:16:11 -0400 Subject: [PATCH 4/6] Merge ADVI --- redirects.txt | 3 +- src/_quarto.yml | 1 - src/cmdstan-guide/_quarto.yml | 1 - src/cmdstan-guide/variational_config.qmd | 144 ++++++++++++++++----- src/cmdstan-guide/variational_intro.qmd | 154 ----------------------- 5 files changed, 114 insertions(+), 189 deletions(-) delete mode 100644 src/cmdstan-guide/variational_intro.qmd diff --git a/redirects.txt b/redirects.txt index b102312c5..04ad62804 100644 --- a/redirects.txt +++ b/redirects.txt @@ -499,7 +499,6 @@ https://mc-stan.org/docs/reference-manual/diagnostic-algorithms.html https://mc- https://mc-stan.org/docs/functions-reference/covariance-matrix-distributions.html https://mc-stan.org/docs/functions-reference/covariance_matrix_distributions.html https://mc-stan.org/docs/cmdstan-guide/standalone-generate-quantities.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_config.html https://mc-stan.org/docs/functions-reference/math-functions.html https://mc-stan.org/docs/functions-reference/mathematical_functions.html -https://mc-stan.org/docs/cmdstan-guide/variational-inference-using-advi.html https://mc-stan.org/docs/cmdstan-guide/variational_intro.html https://mc-stan.org/docs/stan-users-guide/dae-solver.html https://mc-stan.org/docs/stan-users-guide/dae.html https://mc-stan.org/docs/reference-manual/hmc.html https://mc-stan.org/docs/reference-manual/mcmc.html https://mc-stan.org/docs/cmdstan-guide/gc-intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_intro.html @@ -594,3 +593,5 @@ https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/ https://mc-stan.org/docs/cmdstan-guide/optimization_intro.html https://mc-stan.org/docs/cmdstan-guide/optimize_config.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html https://mc-stan.org/docs/cmdstan-guide/pathfinder-intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html +https://mc-stan.org/docs/cmdstan-guide/variational-inference-using-advi.html https://mc-stan.org/docs/cmdstan-guide/variational_config.html +https://mc-stan.org/docs/cmdstan-guide/variational_intro.html https://mc-stan.org/docs/cmdstan-guide/variational_config.html diff --git a/src/_quarto.yml b/src/_quarto.yml index 0e5639a23..693f43245 100644 --- a/src/_quarto.yml +++ b/src/_quarto.yml @@ -242,7 +242,6 @@ website: - cmdstan-guide/example_model_data.qmd - cmdstan-guide/compiling_stan_programs.qmd - cmdstan-guide/parallelization.qmd - - cmdstan-guide/variational_intro.qmd - cmdstan-guide/generate_quantities_intro.qmd - section: "Running CmdStan" contents: diff --git a/src/cmdstan-guide/_quarto.yml b/src/cmdstan-guide/_quarto.yml index bf1d36795..34eca346c 100644 --- a/src/cmdstan-guide/_quarto.yml +++ b/src/cmdstan-guide/_quarto.yml @@ -40,7 +40,6 @@ book: - example_model_data.qmd - compiling_stan_programs.qmd - parallelization.qmd - - variational_intro.qmd - generate_quantities_intro.qmd - part: "Running CmdStan" diff --git a/src/cmdstan-guide/variational_config.qmd b/src/cmdstan-guide/variational_config.qmd index 4e4e17479..27fbe094d 100644 --- a/src/cmdstan-guide/variational_config.qmd +++ b/src/cmdstan-guide/variational_config.qmd @@ -1,11 +1,8 @@ --- -pagetitle: ADVI Configuration +pagetitle: ADVI for Variational Inference --- -# Variational Inference Algorithm: ADVI - -CmdStan can approximate the posterior distribution using variational inference. -The approximation is a Gaussian in the unconstrained variable space. +# Variational Inference using ADVI Stan implements an automatic variational inference algorithm, called Automatic Differentiation Variational Inference (ADVI) @kucukelbir:2017. @@ -25,39 +22,110 @@ which makes it challenging to assess convergence. The algorithm runs until the mean change in ELBO drops below the specified tolerance. -The full set of configuration options available for the `variational` method is -reported at the beginning of the sampler output file as CSV comments. -When the example model `bernoulli.stan` is run with `method=variational` -via the command line with all default arguments, -the resulting Stan CSV file header comments show the complete set -of default configuration options: +The full set of configuration options available for the `variational` method +is available by using the `variational help-all` subcommand. The arguments +with their requested values or defaults are also reported at the beginning +of the algorithm's console output and in the output CSV file's comments. + +The following is a minimal call to Stan's variational inference +algorithm using defaults for everything but the location of the data +file. + ``` -# method = variational -# variational -# algorithm = meanfield (Default) -# meanfield -# iter = 10000 (Default) -# grad_samples = 1 (Default) -# elbo_samples = 100 (Default) -# eta = 1 (Default) -# adapt -# engaged = 1 (Default) -# iter = 50 (Default) -# tol_rel_obj = 0.01 (Default) -# eval_elbo = 100 (Default) -# output_samples = 1000 (Default) +> ./bernoulli variational data file=bernoulli.data.R ``` -The console output includes a notice that this algorithm is considered to be experimental: -``` +Executing this command prints both output to the console and +to a csv file. + +The first part of the console output reports on the configuration used: +the default option `algorithm=meanfield` and the default +tolerances for monitoring the algorithm's convergence. +``` +method = variational + variational + algorithm = meanfield (Default) + meanfield + iter = 10000 (Default) + grad_samples = 1 (Default) + elbo_samples = 100 (Default) + eta = 1 (Default) + adapt + engaged = 1 (Default) + iter = 50 (Default) + tol_rel_obj = 0.01 (Default) + eval_elbo = 100 (Default) + output_samples = 1000 (Default) +id = 0 (Default) +data + file = bernoulli.data.json +init = 2 (Default) +random + seed = 3323783840 (Default) +output + file = output.csv (Default) + diagnostic_file = (Default) + refresh = 100 (Default) +``` +After the configuration has been displayed, informational and +timing messages are output: +``` +------------------------------------------------------------ EXPERIMENTAL ALGORITHM: This procedure has not been thoroughly tested and may be unstable or buggy. The interface is subject to change. +------------------------------------------------------------ + +Gradient evaluation took 2.1e-05 seconds +1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. +Adjust your expectations accordingly! +``` +The rest of the output describes the progression of the algorithm. +An adaptation phase finds a good value for the step size scaling +parameter `eta`. The evidence lower bound (ELBO) is the variational +objective function and is evaluated based on a Monte Carlo estimate. +The variational inference algorithm in Stan is stochastic, which makes +it challenging to assess convergence. That is, while the algorithm +appears to have converged in $\sim$ 250 iterations, the algorithm runs +for another few thousand iterations until mean change in ELBO drops +below the default tolerance of 0.01. +``` +Begin eta adaptation. +Iteration: 1 / 250 [ 0%] (Adaptation) +Iteration: 50 / 250 [ 20%] (Adaptation) +Iteration: 100 / 250 [ 40%] (Adaptation) +Iteration: 150 / 250 [ 60%] (Adaptation) +Iteration: 200 / 250 [ 80%] (Adaptation) +Success! Found best value [eta = 1] earlier than expected. + +Begin stochastic gradient ascent. + iter ELBO delta_ELBO_mean delta_ELBO_med notes + 100 -6.131 1.000 1.000 + 200 -6.458 0.525 1.000 + 300 -6.300 0.359 0.051 + 400 -6.137 0.276 0.051 + 500 -6.243 0.224 0.027 + 600 -6.305 0.188 0.027 + 700 -6.289 0.162 0.025 + 800 -6.402 0.144 0.025 + 900 -6.103 0.133 0.025 + 1000 -6.314 0.123 0.027 + 1100 -6.348 0.024 0.025 + 1200 -6.244 0.020 0.018 + 1300 -6.293 0.019 0.017 + 1400 -6.250 0.017 0.017 + 1500 -6.241 0.015 0.010 MEDIAN ELBO CONVERGED + +Drawing a sample of size 1000 from the approximate posterior... +COMPLETED. ``` + ## Variational algorithms Stan implements two variational algorithms. +They differ in the approximating distribution used in the unconstrained variable space. +By default, ADVI uses option `algorithm=meanfield`. The `algorithm` argument specifies the variational algorithm. - `algorithm=meanfield` - Use a fully factorized Gaussian for the approximation. @@ -73,7 +141,6 @@ for the approximation. - `grad_samples=` Number of samples for Monte Carlo estimate of gradients. Must be $> 0$. Default is $1$. - - `elbo_samples=` Number of samples for Monte Carlo estimate of ELBO (objective function). Must be $> 0$. Default is $100$. - `eta=` Stepsize weighting parameter for adaptive stepsize sequence. Must be $> 0$. Default is $1.0$. @@ -111,7 +178,10 @@ To illustrate, we call Stan's variational inference on the example model and dat ``` By default, the output file is `output.csv`. -Lines 1 - 28 contain configuration information: + +The output follows the same pattern as the output for +sampling, first dumping the entire set of parameters used +as CSV comments: ``` # stan_version_major = 2 # stan_version_minor = 23 @@ -133,21 +203,27 @@ Lines 1 - 28 contain configuration information: # output_samples = 1000 (Default) ... ``` -The column header row is: +Next, the column header row: ``` lp__,log_p__,log_g__,theta ``` -The stepsize adaptation information is: +Additional comments provide stepsize adaptation information: ``` # Stepsize adaptation complete. # eta = 1 ``` -The reported mean variational approximations information is: + +Followed by the data rows. The first line is special: +it is the mean of the variational approximation. + ``` 0,0,0,0.214911 ``` That is, the estimate for `theta` given the data is `0.2`. +The rest of the output contains `output_samples` number of samples +drawn from the variational approximation. + The following is a sample based on this approximation: ``` 0,-14.0252,-5.21718,0.770397 @@ -155,3 +231,7 @@ The following is a sample based on this approximation: 0,-6.75031,-0.0191099,0.241606 ... ``` + +The header indicates the unnormalized log probability with `lp__`. +This is a legacy feature that we do not use for variational inference. +The ELBO is not stored unless a diagnostic option is given. diff --git a/src/cmdstan-guide/variational_intro.qmd b/src/cmdstan-guide/variational_intro.qmd deleted file mode 100644 index 6aa22ff4a..000000000 --- a/src/cmdstan-guide/variational_intro.qmd +++ /dev/null @@ -1,154 +0,0 @@ ---- -pagetitle: ADVI for Variational Inference ---- - -# Variational Inference using ADVI - -The CmdStan method `variational` uses the Automatic Differentiation Variational Inference (ADVI) algorithm of @kucukelbir:2017 -to provide an approximate posterior distribution of the model conditioned on the data. -The approximating distribution it uses is a Gaussian in the unconstrained variable space, -either a fully factorized Gaussian approximation, -specified by argument `algorithm=meanfield` option, or a Gaussian approximation using a -full-rank covariance matrix, specified by argument `algorithm=fullrank`. -By default, ADVI uses option `algorithm=meanfield`. - -The following is a minimal call to Stan's variational inference -algorithm using defaults for everything but the location of the data -file. - -``` -> ./bernoulli variational data file=bernoulli.data.R -``` - -Executing this command prints both output to the console and -to a csv file. - -The first part of the console output reports on the configuration used: -the default option `algorithm=meanfield` and the default -tolerances for monitoring the algorithm's convergence. -``` -method = variational - variational - algorithm = meanfield (Default) - meanfield - iter = 10000 (Default) - grad_samples = 1 (Default) - elbo_samples = 100 (Default) - eta = 1 (Default) - adapt - engaged = 1 (Default) - iter = 50 (Default) - tol_rel_obj = 0.01 (Default) - eval_elbo = 100 (Default) - output_samples = 1000 (Default) -id = 0 (Default) -data - file = bernoulli.data.json -init = 2 (Default) -random - seed = 3323783840 (Default) -output - file = output.csv (Default) - diagnostic_file = (Default) - refresh = 100 (Default) -``` -After the configuration has been displayed, informational and -timing messages are output: -``` ------------------------------------------------------------- -EXPERIMENTAL ALGORITHM: - This procedure has not been thoroughly tested and may be unstable - or buggy. The interface is subject to change. ------------------------------------------------------------- - -Gradient evaluation took 2.1e-05 seconds -1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. -Adjust your expectations accordingly! -``` -The rest of the output describes the progression of the algorithm. -An adaptation phase finds a good value for the step size scaling -parameter `eta`. The evidence lower bound (ELBO) is the variational -objective function and is evaluated based on a Monte Carlo estimate. -The variational inference algorithm in Stan is stochastic, which makes -it challenging to assess convergence. That is, while the algorithm -appears to have converged in $\sim$ 250 iterations, the algorithm runs -for another few thousand iterations until mean change in ELBO drops -below the default tolerance of 0.01. -``` -Begin eta adaptation. -Iteration: 1 / 250 [ 0%] (Adaptation) -Iteration: 50 / 250 [ 20%] (Adaptation) -Iteration: 100 / 250 [ 40%] (Adaptation) -Iteration: 150 / 250 [ 60%] (Adaptation) -Iteration: 200 / 250 [ 80%] (Adaptation) -Success! Found best value [eta = 1] earlier than expected. - -Begin stochastic gradient ascent. - iter ELBO delta_ELBO_mean delta_ELBO_med notes - 100 -6.131 1.000 1.000 - 200 -6.458 0.525 1.000 - 300 -6.300 0.359 0.051 - 400 -6.137 0.276 0.051 - 500 -6.243 0.224 0.027 - 600 -6.305 0.188 0.027 - 700 -6.289 0.162 0.025 - 800 -6.402 0.144 0.025 - 900 -6.103 0.133 0.025 - 1000 -6.314 0.123 0.027 - 1100 -6.348 0.024 0.025 - 1200 -6.244 0.020 0.018 - 1300 -6.293 0.019 0.017 - 1400 -6.250 0.017 0.017 - 1500 -6.241 0.015 0.010 MEDIAN ELBO CONVERGED - -Drawing a sample of size 1000 from the approximate posterior... -COMPLETED. -``` - -The output from variational is written into the file `output.csv` -by default. The output follows the same pattern as the output for -sampling, first dumping the entire set of parameters used -as CSV comments: -``` -# stan_version_major = 2 -# stan_version_minor = 23 -# stan_version_patch = 0 -# model = bernoulli_model -# method = variational -# variational -# algorithm = meanfield (Default) -# meanfield -# iter = 10000 (Default) -# grad_samples = 1 (Default) -# elbo_samples = 100 (Default) -# eta = 1 (Default) -# adapt -# engaged = 1 (Default) -# iter = 50 (Default) -# tol_rel_obj = 0.01 (Default) -# eval_elbo = 100 (Default) -# output_samples = 1000 (Default) -... -``` - -Next is the column header line, followed more CSV comments reporting the -adapted value for the stepsize, followed by the values. -The first line is special: it is the mean of the variational approximation. -The rest of the output contains `output_samples` number of samples -drawn from the variational approximation. - -``` -lp__,log_p__,log_g__,theta -# Stepsize adaptation complete. -# eta = 1 -0,0,0,0.236261 -0,-6.82318,-0.0929121,0.300415 -0,-6.89701,-0.158687,0.321982 -0,-6.99391,-0.23916,0.343643 -0,-7.35801,-0.51787,0.401554 -0,-7.4668,-0.539473,0.123081 -... -``` -The header indicates the unnormalized log probability with `lp__`. -This is a legacy feature that we do not use for variational inference. -The ELBO is not stored unless a diagnostic option is given. From 1f430f7ad702a93206f00b9c08a79d597fa10200 Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 4 Apr 2024 14:26:08 -0400 Subject: [PATCH 5/6] Merge GQ --- redirects.txt | 3 +- src/_quarto.yml | 1 - src/cmdstan-guide/_quarto.yml | 1 - .../generate_quantities_config.qmd | 144 ++++++++++++++++-- .../generate_quantities_intro.qmd | 138 ----------------- src/cmdstan-guide/laplace_sample_config.qmd | 2 +- 6 files changed, 132 insertions(+), 157 deletions(-) delete mode 100644 src/cmdstan-guide/generate_quantities_intro.qmd diff --git a/redirects.txt b/redirects.txt index 04ad62804..d7bb31b51 100644 --- a/redirects.txt +++ b/redirects.txt @@ -501,7 +501,6 @@ https://mc-stan.org/docs/cmdstan-guide/standalone-generate-quantities.html https https://mc-stan.org/docs/functions-reference/math-functions.html https://mc-stan.org/docs/functions-reference/mathematical_functions.html https://mc-stan.org/docs/stan-users-guide/dae-solver.html https://mc-stan.org/docs/stan-users-guide/dae.html https://mc-stan.org/docs/reference-manual/hmc.html https://mc-stan.org/docs/reference-manual/mcmc.html -https://mc-stan.org/docs/cmdstan-guide/gc-intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_intro.html https://mc-stan.org/docs/functions-reference/positive-continuous-distributions.html https://mc-stan.org/docs/functions-reference/positive_continuous_distributions.html https://mc-stan.org/docs/stan-users-guide/truncated-or-censored-data.html https://mc-stan.org/docs/stan-users-guide/truncation-censoring.html https://mc-stan.org/docs/functions-reference/sparse-matrices.html https://mc-stan.org/docs/functions-reference/sparse_matrix_operations.html @@ -595,3 +594,5 @@ https://mc-stan.org/docs/cmdstan-guide/pathfinder_intro.html https://mc-stan.org https://mc-stan.org/docs/cmdstan-guide/pathfinder-intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html https://mc-stan.org/docs/cmdstan-guide/variational-inference-using-advi.html https://mc-stan.org/docs/cmdstan-guide/variational_config.html https://mc-stan.org/docs/cmdstan-guide/variational_intro.html https://mc-stan.org/docs/cmdstan-guide/variational_config.html +https://mc-stan.org/docs/cmdstan-guide/gc-intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_config.html +https://mc-stan.org/docs/cmdstan-guide/generate_quantities_intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_config.html diff --git a/src/_quarto.yml b/src/_quarto.yml index 693f43245..cd17428ae 100644 --- a/src/_quarto.yml +++ b/src/_quarto.yml @@ -242,7 +242,6 @@ website: - cmdstan-guide/example_model_data.qmd - cmdstan-guide/compiling_stan_programs.qmd - cmdstan-guide/parallelization.qmd - - cmdstan-guide/generate_quantities_intro.qmd - section: "Running CmdStan" contents: - cmdstan-guide/command_line_options.qmd diff --git a/src/cmdstan-guide/_quarto.yml b/src/cmdstan-guide/_quarto.yml index 34eca346c..e6d1e9f56 100644 --- a/src/cmdstan-guide/_quarto.yml +++ b/src/cmdstan-guide/_quarto.yml @@ -40,7 +40,6 @@ book: - example_model_data.qmd - compiling_stan_programs.qmd - parallelization.qmd - - generate_quantities_intro.qmd - part: "Running CmdStan" chapters: diff --git a/src/cmdstan-guide/generate_quantities_config.qmd b/src/cmdstan-guide/generate_quantities_config.qmd index 2bf2a577b..9cfd5fd61 100644 --- a/src/cmdstan-guide/generate_quantities_config.qmd +++ b/src/cmdstan-guide/generate_quantities_config.qmd @@ -1,21 +1,75 @@ --- -pagetitle: Standalone Generate Quantities +pagetitle: Generating Quantities of Interest from a Fitted Model --- -# Standalone Generate Quantities +# Generating Quantities of Interest from a Fitted Model {#gc-intro} The `generate_quantities` method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. -For an overview of the uses of this feature, see the -[QuickStart Guide section](generate_quantities_intro.qmd) -and the Stan User's Guide section on -[Stand-alone generated quantities and ongoing prediction](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#stand-alone-generated-quantities-and-ongoing-prediction). +Instead, you write a modified version of the original Stan program +and add a generated quantities block or modify the existing one +which specifies how to compute the new quantities of interest. +Running the `generate_quantities` method on the new program +together with sampler outputs (i.e., a set of draws) +from the fitted model runs the generated quantities block +of the new program using the the existing sample by plugging +in the per-draw parameter estimates for the computations in +the generated quantities block. This method requires sub-argument `fitted_params` which takes as its value -an existing Stan CSV file that contains a sample from an equivalent model, -i.e., a model with the same parameters, transformed parameters, and model blocks, +an existing [Stan CSV](stan_csv_apdx.qmd) file that contains a parameter values +from an equivalent model, i.e., a model with the same parameters and transformed parameters blocks, conditioned on the same data. +The [generated quantities block](https://mc-stan.org/docs/reference-manual/blocks.html#generated-quantities) +computes *quantities of interest* (QOIs) based on the data, +transformed data, parameters, and transformed parameters. +It can be used to: + +- generate simulated data for model testing by forward sampling +- generate predictions for new data +- calculate posterior event probabilities, including multiple + comparisons, sign tests, etc. +- calculating posterior expectations +- transform parameters for reporting +- apply full Bayesian decision theory +- calculate log likelihoods, deviances, etc. for model comparison + + +For an overview of the uses of this feature, see the Stan User's Guide section on +[Stand-alone generated quantities and ongoing prediction](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#stand-alone-generated-quantities-and-ongoing-prediction). + + +## Example + +To illustrate how this works we use the `generate_quantities` method +to do posterior predictive checks using the estimate of `theta` given +the example bernoulli model and data, following the +[posterior predictive simulation](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#posterior-predictive-simulation-in-stan) +procedure in the Stan User's Guide. + +We write a program `bernoulli_ppc.stan` which contains +the following generated quantities block, with comments +to explain the procedure: +```stan +generated quantities { + real theta_rep; + array[N] int y_sim; + // use current estimate of theta to generate new sample + for (n in 1:N) { + y_sim[n] = bernoulli_rng(theta); + } + // estimate theta_rep from new sample + theta_rep = sum(y_sim) * 1.0 / N; +} +``` +The rest of the program is the same as in `bernoulli.stan`. + +The `generate_method` requires the sub-argument `fitted_params` +which takes as its value the name of a Stan CSV file. +The per-draw parameter estimates from the `fitted_params` file will +be used to run the generated quantities block. + If we run the `bernoulli.stan` program for a single chain to generate a sample in file `bernoulli_fit.csv`: @@ -32,7 +86,70 @@ checks: output file=bernoulli_ppc.csv ``` -The `fitted_params` file must be a Stan CSV file; attempts to use a regular CSV file +The output file `bernoulli_ppc.csv` consists of just the values for the variables declared in the generated quantities block, i.e., `theta_rep` and the elements of `y_sim`: + +``` +# model = bernoulli_ppc_model +# method = generate_quantities +# generate_quantities +# fitted_params = bernoulli_fit.csv +# id = 0 (Default) +# data +# file = bernoulli.data.json +# init = 2 (Default) +# random +# seed = 2135140492 (Default) +# output +# file = bernoulli_ppc.csv +# diagnostic_file = (Default) +# refresh = 100 (Default) +theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10 +0.2,0,0,1,0,0,0,0,0,1,0 +0.3,1,0,0,1,0,1,0,0,0,0 +0.8,1,0,1,1,1,1,1,1,1,0 +0.1,0,0,0,0,0,1,0,0,0,0 +0.3,0,0,0,0,0,0,1,1,1,0 +``` + +_Note_: the only relevant analysis of the resulting CSV output is computing per-column statistics; this can easily be done in Python, R, Excel or similar, +or you can use the CmdStanPy and CmdStanR interfaces which provide a +better user experience for this workflow. + +Given the current implementation, to see the fitted parameter values for each draw, +create a copy variable in the generated quantities block, e.g.: + +```stan +generated quantities { + real theta_cp = theta; + real theta_rep; + array[N] int y_sim; + // use current estimate of theta to generate new sample + for (n in 1:N) { + y_sim[n] = bernoulli_rng(theta); + } + // estimate theta_rep from new sample + theta_rep = sum(y_sim) * 1.0 / N; +} +``` + +Now the output is slightly more interpretable: `theta_cp` is the same as the `theta` +used to generate the values `y_sim[1]` through `y_sim[1]`. +Comparing columns `theta_cp` and `theta_rep` allows us to see how the +uncertainty in our estimate of `theta` is carried forward +into our predictions: + +``` +theta_cp,theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10 +0.102391,0,0,0,0,0,0,0,0,0,0,0 +0.519567,0.2,0,1,0,0,1,0,0,0,0,0 +0.544634,0.6,1,0,0,0,0,1,1,1,1,1 +0.167651,0,0,0,0,0,0,0,0,0,0,0 +0.167651,0.1,1,0,0,0,0,0,0,0,0,0 +``` + +## Errors + +The `fitted_params` file must be a [Stan CSV](stan_csv_apdx.qmd) file; attempts to use a regular CSV file will result an error message of the form: ``` @@ -50,15 +167,12 @@ Error reading fitted param names from sample csv file The parameter values of the `fitted_params` are on the constrained scale and must obey all constraints. -For example, if we modify the contencts of the first +For example, if we modify the contents of the first reported draw in `bernoulli_fit.csv` so that the value of `theta` is outside the declared bounds `real`, the program will return the following error message: ``` -Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] (in 'bernoulli_ppc.stan', line 5, column 2 to column 30) +Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] \ +(in 'bernoulli_ppc.stan', line 5, column 2 to column 30) ``` - - - - diff --git a/src/cmdstan-guide/generate_quantities_intro.qmd b/src/cmdstan-guide/generate_quantities_intro.qmd deleted file mode 100644 index b2933ec95..000000000 --- a/src/cmdstan-guide/generate_quantities_intro.qmd +++ /dev/null @@ -1,138 +0,0 @@ ---- -pagetitle: Generating Quantities of Interest from a Fitted Model ---- - -# Generating Quantities of Interest from a Fitted Model {#gc-intro} - -The [generated quantities block](https://mc-stan.org/docs/reference-manual/blocks.html#generated-quantities) -computes *quantities of interest* (QOIs) based on the data, -transformed data, parameters, and transformed parameters. -It can be used to: - -- generate simulated data for model testing by forward sampling -- generate predictions for new data -- calculate posterior event probabilities, including multiple - comparisons, sign tests, etc. -- calculating posterior expectations -- transform parameters for reporting -- apply full Bayesian decision theory -- calculate log likelihoods, deviances, etc. for model comparison - -The `generate_quantities` method allows you to generate additional quantities -of interest from a fitted model without re-running the sampler. -Instead, you write a modified version of the original Stan program -and add a generated quantities block or modify the existing one -which specifies how to compute the new quantities of interest. -Running the `generate_quantities` method on the new program -together with sampler outputs (i.e., a set of draws) -from the fitted model runs the generated quantities block -of the new program using the the existing sample by plugging -in the per-draw parameter estimates for the computations in -the generated quantities block. -See the Stan User's Guide section -[Stand-alone generated quantities and ongoing prediction](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#stand-alone-generated-quantities-and-ongoing-prediction). - -To illustrate how this works we use the `generate_quantities` method -to do posterior predictive checks using the estimate of `theta` given -the example bernoulli model and data, following the -[posterior predictive simulation](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#posterior-predictive-simulation-in-stan) -procedure in the Stan User's Guide. - -We write a program `bernoulli_ppc.stan` which contains -the following generated quantities block, with comments -to explain the procedure: -```stan -generated quantities { - real theta_rep; - array[N] int y_sim; - // use current estimate of theta to generate new sample - for (n in 1:N) { - y_sim[n] = bernoulli_rng(theta); - } - // estimate theta_rep from new sample - theta_rep = sum(y_sim) * 1.0 / N; -} -``` -The rest of the program is the same as in `bernoulli.stan`. - -The `generate_method` requires the sub-argument `fitted_params` -which takes as its value the name of a Stan CSV file. -The per-draw parameter estimates from the `fitted_params` file will -be used to run the generated quantities block. - -If we run the `bernoulli.stan` program for a single chain to -generate a sample in file `bernoulli_fit.csv`: - -``` -> ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv -``` - -Then we can run the `bernoulli_ppc.stan` to carry out the posterior predictive -checks: - -``` -> ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \ - data file=bernoulli.data.json \ - output file=bernoulli_ppc.csv -``` - -The output file `bernoulli_ppc.csv` consists of just the values for the variables declared in the generated quantities block, i.e., `theta_rep` and the elements of `y_sim`: - -``` -# model = bernoulli_ppc_model -# method = generate_quantities -# generate_quantities -# fitted_params = bernoulli_fit.csv -# id = 0 (Default) -# data -# file = bernoulli.data.json -# init = 2 (Default) -# random -# seed = 2135140492 (Default) -# output -# file = bernoulli_ppc.csv -# diagnostic_file = (Default) -# refresh = 100 (Default) -theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10 -0.2,0,0,1,0,0,0,0,0,1,0 -0.3,1,0,0,1,0,1,0,0,0,0 -0.8,1,0,1,1,1,1,1,1,1,0 -0.1,0,0,0,0,0,1,0,0,0,0 -0.3,0,0,0,0,0,0,1,1,1,0 -``` - -_Note_: the only relevant analysis of the resulting CSV output is computing per-column statistics; this can easily be done in Python, R, Excel or similar, -or you can use the CmdStanPy and CmdStanR interfaces which provide a -better user experience for this workflow. - -Given the current implementation, to see the fitted parameter values for each draw, -create a copy variable in the generated quantities block, e.g.: - -```stan -generated quantities { - real theta_cp = theta; - real theta_rep; - array[N] int y_sim; - // use current estimate of theta to generate new sample - for (n in 1:N) { - y_sim[n] = bernoulli_rng(theta); - } - // estimate theta_rep from new sample - theta_rep = sum(y_sim) * 1.0 / N; -} -``` - -Now the output is slightly more interpretable: `theta_cp` is the same as the `theta` -used to generate the values `y_sim[1]` through `y_sim[1]`. -Comparing columns `theta_cp` and `theta_rep` allows us to see how the -uncertainty in our estimate of `theta` is carried forward -into our predictions: - -``` -theta_cp,theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10 -0.102391,0,0,0,0,0,0,0,0,0,0,0 -0.519567,0.2,0,1,0,0,1,0,0,0,0,0 -0.544634,0.6,1,0,0,0,0,1,1,1,1,1 -0.167651,0,0,0,0,0,0,0,0,0,0,0 -0.167651,0.1,1,0,0,0,0,0,0,0,0,0 -``` diff --git a/src/cmdstan-guide/laplace_sample_config.qmd b/src/cmdstan-guide/laplace_sample_config.qmd index f1bf8f198..557ff3385 100644 --- a/src/cmdstan-guide/laplace_sample_config.qmd +++ b/src/cmdstan-guide/laplace_sample_config.qmd @@ -5,7 +5,7 @@ pagetitle: Laplace sampling # Laplace sampling The `laplace` method produces a sample from a normal approximation -centered at the mode of a distribution in the unconstrained space. +centered at the [mode](optimize_config.qmd) of a distribution in the unconstrained space. If the mode is a maximum a posteriori (MAP) estimate, the samples provide an estimate of the mean and standard deviation of the posterior distribution. From eb5bbd15682a662c4aabcfbbceabdda169dbf485 Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 4 Apr 2024 16:10:12 -0400 Subject: [PATCH 6/6] Fixes per review --- .../generate_quantities_config.qmd | 58 +++++++++---------- src/cmdstan-guide/mcmc_config.qmd | 20 +++---- src/cmdstan-guide/optimize_config.qmd | 17 +++--- src/cmdstan-guide/pathfinder_config.qmd | 2 +- src/cmdstan-guide/variational_config.qmd | 2 +- 5 files changed, 48 insertions(+), 51 deletions(-) diff --git a/src/cmdstan-guide/generate_quantities_config.qmd b/src/cmdstan-guide/generate_quantities_config.qmd index 9cfd5fd61..51bfd55a0 100644 --- a/src/cmdstan-guide/generate_quantities_config.qmd +++ b/src/cmdstan-guide/generate_quantities_config.qmd @@ -18,7 +18,7 @@ the generated quantities block. This method requires sub-argument `fitted_params` which takes as its value an existing [Stan CSV](stan_csv_apdx.qmd) file that contains a parameter values -from an equivalent model, i.e., a model with the same parameters and transformed parameters blocks, +from an equivalent model, i.e., a model with the same parameters block, conditioned on the same data. The [generated quantities block](https://mc-stan.org/docs/reference-manual/blocks.html#generated-quantities) @@ -30,7 +30,7 @@ It can be used to: - generate predictions for new data - calculate posterior event probabilities, including multiple comparisons, sign tests, etc. -- calculating posterior expectations +- calculate posterior expectations - transform parameters for reporting - apply full Bayesian decision theory - calculate log likelihoods, deviances, etc. for model comparison @@ -53,21 +53,20 @@ the following generated quantities block, with comments to explain the procedure: ```stan generated quantities { - real theta_rep; array[N] int y_sim; // use current estimate of theta to generate new sample for (n in 1:N) { y_sim[n] = bernoulli_rng(theta); } // estimate theta_rep from new sample - theta_rep = sum(y_sim) * 1.0 / N; + real theta_rep = sum(y_sim) * 1.0 / N; } ``` The rest of the program is the same as in `bernoulli.stan`. The `generate_method` requires the sub-argument `fitted_params` which takes as its value the name of a Stan CSV file. -The per-draw parameter estimates from the `fitted_params` file will +The per-draw parameter values from the `fitted_params` file will be used to run the generated quantities block. If we run the `bernoulli.stan` program for a single chain to @@ -86,49 +85,48 @@ checks: output file=bernoulli_ppc.csv ``` -The output file `bernoulli_ppc.csv` consists of just the values for the variables declared in the generated quantities block, i.e., `theta_rep` and the elements of `y_sim`: +The output file `bernoulli_ppc.csv` contains only the values for the variables declared in the +`generated quantities` block, i.e., `theta_rep` and the elements of `y_sim`: ``` # model = bernoulli_ppc_model # method = generate_quantities # generate_quantities # fitted_params = bernoulli_fit.csv -# id = 0 (Default) +# id = 1 (Default) # data # file = bernoulli.data.json # init = 2 (Default) # random -# seed = 2135140492 (Default) +# seed = 2983956445 (Default) # output -# file = bernoulli_ppc.csv -# diagnostic_file = (Default) -# refresh = 100 (Default) -theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10 -0.2,0,0,1,0,0,0,0,0,1,0 -0.3,1,0,0,1,0,1,0,0,0,0 -0.8,1,0,1,1,1,1,1,1,1,0 -0.1,0,0,0,0,0,1,0,0,0,0 -0.3,0,0,0,0,0,0,1,1,1,0 +# file = output.csv (Default) +y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_rep +1,1,1,0,0,0,1,1,0,1,0.6 +1,1,0,1,0,0,1,0,1,0,0.5 +1,0,1,1,1,1,1,1,0,1,0.8 +0,1,0,1,0,1,0,1,0,0,0.4 +1,0,0,0,0,0,0,0,0,0,0.1 +0,0,0,0,0,1,1,1,0,0,0.3 +0,0,1,0,1,0,0,0,0,0,0.2 +1,0,1,0,1,1,0,1,1,0,0.6 +... ``` -_Note_: the only relevant analysis of the resulting CSV output is computing per-column statistics; this can easily be done in Python, R, Excel or similar, -or you can use the CmdStanPy and CmdStanR interfaces which provide a -better user experience for this workflow. Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.: ```stan generated quantities { - real theta_cp = theta; - real theta_rep; array[N] int y_sim; // use current estimate of theta to generate new sample for (n in 1:N) { y_sim[n] = bernoulli_rng(theta); } + real theta_cp = theta; // estimate theta_rep from new sample - theta_rep = sum(y_sim) * 1.0 / N; + real theta_rep = sum(y_sim) * 1.0 / N; } ``` @@ -139,12 +137,14 @@ uncertainty in our estimate of `theta` is carried forward into our predictions: ``` -theta_cp,theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10 -0.102391,0,0,0,0,0,0,0,0,0,0,0 -0.519567,0.2,0,1,0,0,1,0,0,0,0,0 -0.544634,0.6,1,0,0,0,0,1,1,1,1,1 -0.167651,0,0,0,0,0,0,0,0,0,0,0 -0.167651,0.1,1,0,0,0,0,0,0,0,0,0 +y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_cp,theta_rep +0,1,1,0,1,0,0,1,1,0,0.545679,0.5 +1,1,1,1,1,1,0,1,1,0,0.527164,0.8 +1,1,1,1,0,1,1,1,1,0,0.529116,0.8 +1,0,1,1,1,1,0,0,1,0,0.478844,0.6 +0,1,0,0,0,0,1,0,1,0,0.238793,0.3 +0,0,0,0,0,1,1,0,0,0,0.258294,0.2 +1,1,1,0,0,0,0,0,0,0,0.258465,0.3 ``` ## Errors diff --git a/src/cmdstan-guide/mcmc_config.qmd b/src/cmdstan-guide/mcmc_config.qmd index d1961ff8f..d616b1183 100644 --- a/src/cmdstan-guide/mcmc_config.qmd +++ b/src/cmdstan-guide/mcmc_config.qmd @@ -5,19 +5,19 @@ pagetitle: MCMC Sampling # MCMC Sampling using Hamiltonian Monte Carlo {#mcmc-config} -The `sample` method provides Bayesian inference over the model conditioned on data -using Hamiltonian Monte Carlo (HMC) sampling. By default, the inference engine used is -the No-U-Turn sampler (NUTS), an adaptive form of Hamiltonian Monte Carlo sampling. +The `sample` method provides Bayesian inference over the model +conditioned on data using Hamiltonian Monte Carlo (HMC) sampling. +By default, the inference engine used is the No-U-Turn sampler (NUTS), +an adaptive form of Hamiltonian Monte Carlo sampling. For details on HMC and NUTS, see the Stan Reference Manual chapter on [MCMC Sampling](https://mc-stan.org/docs/reference-manual/mcmc.html). ## Running the sampler -To generate a sample from the posterior distribution of -the model conditioned on the data, -we run the executable program with the argument `sample` or `method=sample` -together with the input data. +To generate a sample from the posterior distribution of the model conditioned +on the data, we run the executable program with the argument `sample` or +`method=sample` together with the input data. The executable can be run from any directory. The full set of configuration options available for the `sample` method @@ -491,8 +491,8 @@ By default, no auxiliary output file is produced. A Markov chain generates samples from the target distribution only after it has converged to equilibrium. In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. -One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior -to other randomly initialized chains. +One way to monitor whether a chain has approximately converged to the equilibrium distribution is to +compare its behavior to other randomly initialized chains. For robust diagnostics, we recommend running 4 chains. The preferred way of using multiple chains is to run them all from the same executable using @@ -598,7 +598,7 @@ and R_hat is the potential scale reduction factor on split chains (at convergence, R_hat=1). ``` -In this example, we conditioned the model on a dataset consisting of the outcomes of +In this example, we conditioned the model on data consisting of the outcomes of 10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% percentile values for `theta` reflect the uncertainty in our estimate, due to the small amount of data, given the prior of `beta(1, 1)` diff --git a/src/cmdstan-guide/optimize_config.qmd b/src/cmdstan-guide/optimize_config.qmd index 58a156dae..ea53da626 100644 --- a/src/cmdstan-guide/optimize_config.qmd +++ b/src/cmdstan-guide/optimize_config.qmd @@ -4,10 +4,10 @@ pagetitle: Optimization # Optimization -The CmdStan executable can run Stan’s optimization algorithms +The CmdStan executable can run Stan’s optimization algorithms, which provide a deterministic method to find the posterior mode. If the posterior is not convex, there is no guarantee Stan -will be able to find the global mode as opposed to a local optimum +will be able to find the global optimum as opposed to a local optimum of log probability. @@ -16,12 +16,10 @@ is available by using the `optimize help-all` subcommand. The arguments with their requested values or defaults are also reported at the beginning of the optimizer console output and in the output CSV file's comments. -The executable does not need to be -recompiled in order to switch from sampling to optimization, and the -data input format is the same. -The following is a minimal call to -Stan's optimizer using defaults for everything but the location of the -data file. +The executable does not need to be recompiled in order to switch from +sampling to optimization, and the data input format is the same. The +following is a minimal call to Stan's optimizer using defaults for +everything but the location of the data file. ``` > ./bernoulli optimize data file=bernoulli.data.json @@ -107,7 +105,7 @@ used as comment lines: ... ``` -Following the config information, are two lines of output: +Following the config information are two lines of output, the CSV headers and the recorded values: ``` @@ -166,7 +164,6 @@ Applying the Jacobian adjustment produces the maximum a posteriori estimate (MAP the maximum value of the posterior distribution, $\mathrm{argmax}_{\theta}\ p(y | \theta)\,p(\theta)$. - By default this value is `0` (false), do not include the Jacobian adjustment. diff --git a/src/cmdstan-guide/pathfinder_config.qmd b/src/cmdstan-guide/pathfinder_config.qmd index e15921d1c..22b57dc44 100644 --- a/src/cmdstan-guide/pathfinder_config.qmd +++ b/src/cmdstan-guide/pathfinder_config.qmd @@ -69,7 +69,7 @@ num_threads = 1 (Default) The rest of the output describes the progression of the algorithm. By default, the Pathfinder algorithm runs 4 single-path Pathfinders in -parallel, the uses importance resampling on the set of returned draws +parallel, then uses importance resampling on the set of returned draws to produce the specified number of draws. ``` Path [1] :Initial log joint density = -11.543343 diff --git a/src/cmdstan-guide/variational_config.qmd b/src/cmdstan-guide/variational_config.qmd index 27fbe094d..6d38c551b 100644 --- a/src/cmdstan-guide/variational_config.qmd +++ b/src/cmdstan-guide/variational_config.qmd @@ -213,7 +213,7 @@ Additional comments provide stepsize adaptation information: # eta = 1 ``` -Followed by the data rows. The first line is special: +Followed by the data rows. The first line is special --- it is the mean of the variational approximation. ```