Skip to content

Commit

Permalink
Merge pull request #752 from stan-dev/consolidate-cmdstan-guide
Browse files Browse the repository at this point in the history
Deduplicate pages in CmdStan guide
  • Loading branch information
bob-carpenter authored Apr 4, 2024
2 parents 2b655aa + eb5bbd1 commit 262f049
Show file tree
Hide file tree
Showing 15 changed files with 813 additions and 1,003 deletions.
13 changes: 9 additions & 4 deletions redirects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,6 @@ https://mc-stan.org/docs/functions-reference/sort-functions.html https://mc-stan
https://mc-stan.org/docs/cmdstan-guide/extracting-log-probabilities-and-gradients-for-diagnostics.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html#pathfinder-configuration
https://mc-stan.org/docs/functions-reference/negative-binomial-distribution.html https://mc-stan.org/docs/functions-reference/unbounded_discrete_distributions.html#negative-binomial-distribution
https://mc-stan.org/docs/reference-manual/sundials-license.html https://mc-stan.org/docs/reference-manual/licenses.html#sundials-license
https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_sampling_intro.html#running-the-sampler
https://mc-stan.org/docs/functions-reference/ordered-probit-distribution.html https://mc-stan.org/docs/functions-reference/bounded_discrete_distributions.html#ordered-probit-distribution
https://mc-stan.org/docs/functions-reference/multivariate-gaussian-process-distribution-cholesky-parameterization.html https://mc-stan.org/docs/functions-reference/distributions_over_unbounded_vectors.html#multivariate-gaussian-process-distribution-cholesky-parameterization
https://mc-stan.org/docs/stan-users-guide/bayesian-poststratification.html https://mc-stan.org/docs/stan-users-guide/poststratification.html#bayesian-poststratification
Expand Down Expand Up @@ -500,10 +499,8 @@ https://mc-stan.org/docs/reference-manual/diagnostic-algorithms.html https://mc-
https://mc-stan.org/docs/functions-reference/covariance-matrix-distributions.html https://mc-stan.org/docs/functions-reference/covariance_matrix_distributions.html
https://mc-stan.org/docs/cmdstan-guide/standalone-generate-quantities.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_config.html
https://mc-stan.org/docs/functions-reference/math-functions.html https://mc-stan.org/docs/functions-reference/mathematical_functions.html
https://mc-stan.org/docs/cmdstan-guide/variational-inference-using-advi.html https://mc-stan.org/docs/cmdstan-guide/variational_intro.html
https://mc-stan.org/docs/stan-users-guide/dae-solver.html https://mc-stan.org/docs/stan-users-guide/dae.html
https://mc-stan.org/docs/reference-manual/hmc.html https://mc-stan.org/docs/reference-manual/mcmc.html
https://mc-stan.org/docs/cmdstan-guide/gc-intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_intro.html
https://mc-stan.org/docs/functions-reference/positive-continuous-distributions.html https://mc-stan.org/docs/functions-reference/positive_continuous_distributions.html
https://mc-stan.org/docs/stan-users-guide/truncated-or-censored-data.html https://mc-stan.org/docs/stan-users-guide/truncation-censoring.html
https://mc-stan.org/docs/functions-reference/sparse-matrices.html https://mc-stan.org/docs/functions-reference/sparse_matrix_operations.html
Expand All @@ -530,7 +527,6 @@ https://mc-stan.org/docs/functions-reference/unbounded-discrete-distributions.ht
https://mc-stan.org/docs/stan-users-guide/functions-programming.html https://mc-stan.org/docs/stan-users-guide/user-functions.html
https://mc-stan.org/docs/functions-reference/removed-functions.html https://mc-stan.org/docs/functions-reference/removed_functions.html
https://mc-stan.org/docs/stan-users-guide/custom-probability-functions.html https://mc-stan.org/docs/stan-users-guide/custom-probability.html
https://mc-stan.org/docs/cmdstan-guide/pathfinder-intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_intro.html
https://mc-stan.org/docs/functions-reference/bounded-continuous-distributions.html https://mc-stan.org/docs/functions-reference/bounded_continuous_distributions.html
https://mc-stan.org/docs/stan-users-guide/mixture-modeling.html https://mc-stan.org/docs/stan-users-guide/finite-mixtures.html
https://mc-stan.org/docs/functions-reference/multivariate-discrete-distributions.html https://mc-stan.org/docs/functions-reference/multivariate_discrete_distributions.html
Expand Down Expand Up @@ -591,3 +587,12 @@ https://mc-stan.org/docs/cmdstan-guide/cmdstan-tools.html https://mc-stan.org/do
https://mc-stan.org/docs/cmdstan-guide/print-deprecated-mcmc-output-analysis.html https://mc-stan.org/docs/cmdstan-guide/print.html
https://mc-stan.org/docs/cmdstan-guide/command-line-interface-overview.html https://mc-stan.org/docs/cmdstan-guide/command_line_options.html
https://mc-stan.org/docs/functions-reference/matrix-arithmetic-operators.html https://mc-stan.org/docs/functions-reference/matrix_operations.html#matrix-arithmetic-operators
https://mc-stan.org/docs/cmdstan-guide/mcmc_sampling_intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html
https://mc-stan.org/docs/cmdstan-guide/mcmc-intro.html https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html
https://mc-stan.org/docs/cmdstan-guide/optimization_intro.html https://mc-stan.org/docs/cmdstan-guide/optimize_config.html
https://mc-stan.org/docs/cmdstan-guide/pathfinder_intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html
https://mc-stan.org/docs/cmdstan-guide/pathfinder-intro.html https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html
https://mc-stan.org/docs/cmdstan-guide/variational-inference-using-advi.html https://mc-stan.org/docs/cmdstan-guide/variational_config.html
https://mc-stan.org/docs/cmdstan-guide/variational_intro.html https://mc-stan.org/docs/cmdstan-guide/variational_config.html
https://mc-stan.org/docs/cmdstan-guide/gc-intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_config.html
https://mc-stan.org/docs/cmdstan-guide/generate_quantities_intro.html https://mc-stan.org/docs/cmdstan-guide/generate_quantities_config.html
11 changes: 3 additions & 8 deletions src/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -236,17 +236,13 @@ website:
contents:
- cmdstan-guide/index.qmd
- section: "Version {{< env STAN_DOCS_VERSION >}}"
- section: "Quickstart Guide"
- section: "Getting Started"
contents:
- cmdstan-guide/installation.qmd
- cmdstan-guide/example_model_data.qmd
- cmdstan-guide/compiling_stan_programs.qmd
- cmdstan-guide/mcmc_sampling_intro.qmd
- cmdstan-guide/optimization_intro.qmd
- cmdstan-guide/pathfinder_intro.qmd
- cmdstan-guide/variational_intro.qmd
- cmdstan-guide/generate_quantities_intro.qmd
- section: "Reference Manual"
- cmdstan-guide/parallelization.qmd
- section: "Running CmdStan"
contents:
- cmdstan-guide/command_line_options.qmd
- cmdstan-guide/mcmc_config.qmd
Expand All @@ -257,7 +253,6 @@ website:
- cmdstan-guide/laplace_sample_config.qmd
- cmdstan-guide/log_prob_config.qmd
- cmdstan-guide/diagnose_config.qmd
- cmdstan-guide/parallelization.qmd
- section: "Tools and Utilities"
contents:
- cmdstan-guide/stanc.qmd
Expand Down
11 changes: 3 additions & 8 deletions src/cmdstan-guide/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,14 @@ book:

chapters:
- index.qmd
- part: "Quickstart Guide"
- part: "Getting Started"
chapters:
- installation.qmd
- example_model_data.qmd
- compiling_stan_programs.qmd
- mcmc_sampling_intro.qmd
- optimization_intro.qmd
- pathfinder_intro.qmd
- variational_intro.qmd
- generate_quantities_intro.qmd
- parallelization.qmd

- part: "Reference Manual"
- part: "Running CmdStan"
chapters:
- command_line_options.qmd
- mcmc_config.qmd
Expand All @@ -56,7 +52,6 @@ book:
- laplace_sample_config.qmd
- log_prob_config.qmd
- diagnose_config.qmd
- parallelization.qmd

- part: "CmdStan Utilities"
chapters:
Expand Down
144 changes: 129 additions & 15 deletions src/cmdstan-guide/generate_quantities_config.qmd
Original file line number Diff line number Diff line change
@@ -1,21 +1,74 @@
---
pagetitle: Standalone Generate Quantities
pagetitle: Generating Quantities of Interest from a Fitted Model
---

# Standalone Generate Quantities
# Generating Quantities of Interest from a Fitted Model {#gc-intro}

The `generate_quantities` method allows you to generate additional
quantities of interest from a fitted model without re-running the sampler.
For an overview of the uses of this feature, see the
[QuickStart Guide section](generate_quantities_intro.qmd)
and the Stan User's Guide section on
[Stand-alone generated quantities and ongoing prediction](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#stand-alone-generated-quantities-and-ongoing-prediction).
Instead, you write a modified version of the original Stan program
and add a generated quantities block or modify the existing one
which specifies how to compute the new quantities of interest.
Running the `generate_quantities` method on the new program
together with sampler outputs (i.e., a set of draws)
from the fitted model runs the generated quantities block
of the new program using the the existing sample by plugging
in the per-draw parameter estimates for the computations in
the generated quantities block.

This method requires sub-argument `fitted_params` which takes as its value
an existing Stan CSV file that contains a sample from an equivalent model,
i.e., a model with the same parameters, transformed parameters, and model blocks,
an existing [Stan CSV](stan_csv_apdx.qmd) file that contains a parameter values
from an equivalent model, i.e., a model with the same parameters block,
conditioned on the same data.

The [generated quantities block](https://mc-stan.org/docs/reference-manual/blocks.html#generated-quantities)
computes *quantities of interest* (QOIs) based on the data,
transformed data, parameters, and transformed parameters.
It can be used to:

- generate simulated data for model testing by forward sampling
- generate predictions for new data
- calculate posterior event probabilities, including multiple
comparisons, sign tests, etc.
- calculate posterior expectations
- transform parameters for reporting
- apply full Bayesian decision theory
- calculate log likelihoods, deviances, etc. for model comparison


For an overview of the uses of this feature, see the Stan User's Guide section on
[Stand-alone generated quantities and ongoing prediction](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#stand-alone-generated-quantities-and-ongoing-prediction).


## Example

To illustrate how this works we use the `generate_quantities` method
to do posterior predictive checks using the estimate of `theta` given
the example bernoulli model and data, following the
[posterior predictive simulation](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#posterior-predictive-simulation-in-stan)
procedure in the Stan User's Guide.

We write a program `bernoulli_ppc.stan` which contains
the following generated quantities block, with comments
to explain the procedure:
```stan
generated quantities {
array[N] int y_sim;
// use current estimate of theta to generate new sample
for (n in 1:N) {
y_sim[n] = bernoulli_rng(theta);
}
// estimate theta_rep from new sample
real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
}
```
The rest of the program is the same as in `bernoulli.stan`.

The `generate_method` requires the sub-argument `fitted_params`
which takes as its value the name of a Stan CSV file.
The per-draw parameter values from the `fitted_params` file will
be used to run the generated quantities block.

If we run the `bernoulli.stan` program for a single chain to
generate a sample in file `bernoulli_fit.csv`:

Expand All @@ -32,7 +85,71 @@ checks:
output file=bernoulli_ppc.csv
```

The `fitted_params` file must be a Stan CSV file; attempts to use a regular CSV file
The output file `bernoulli_ppc.csv` contains only the values for the variables declared in the
`generated quantities` block, i.e., `theta_rep` and the elements of `y_sim`:

```
# model = bernoulli_ppc_model
# method = generate_quantities
# generate_quantities
# fitted_params = bernoulli_fit.csv
# id = 1 (Default)
# data
# file = bernoulli.data.json
# init = 2 (Default)
# random
# seed = 2983956445 (Default)
# output
# file = output.csv (Default)
y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_rep
1,1,1,0,0,0,1,1,0,1,0.6
1,1,0,1,0,0,1,0,1,0,0.5
1,0,1,1,1,1,1,1,0,1,0.8
0,1,0,1,0,1,0,1,0,0,0.4
1,0,0,0,0,0,0,0,0,0,0.1
0,0,0,0,0,1,1,1,0,0,0.3
0,0,1,0,1,0,0,0,0,0,0.2
1,0,1,0,1,1,0,1,1,0,0.6
...
```


Given the current implementation, to see the fitted parameter values for each draw,
create a copy variable in the generated quantities block, e.g.:

```stan
generated quantities {
array[N] int y_sim;
// use current estimate of theta to generate new sample
for (n in 1:N) {
y_sim[n] = bernoulli_rng(theta);
}
real<lower=0, upper=1> theta_cp = theta;
// estimate theta_rep from new sample
real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
}
```

Now the output is slightly more interpretable: `theta_cp` is the same as the `theta`
used to generate the values `y_sim[1]` through `y_sim[1]`.
Comparing columns `theta_cp` and `theta_rep` allows us to see how the
uncertainty in our estimate of `theta` is carried forward
into our predictions:

```
y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_cp,theta_rep
0,1,1,0,1,0,0,1,1,0,0.545679,0.5
1,1,1,1,1,1,0,1,1,0,0.527164,0.8
1,1,1,1,0,1,1,1,1,0,0.529116,0.8
1,0,1,1,1,1,0,0,1,0,0.478844,0.6
0,1,0,0,0,0,1,0,1,0,0.238793,0.3
0,0,0,0,0,1,1,0,0,0,0.258294,0.2
1,1,1,0,0,0,0,0,0,0,0.258465,0.3
```

## Errors

The `fitted_params` file must be a [Stan CSV](stan_csv_apdx.qmd) file; attempts to use a regular CSV file
will result an error message of the form:

```
Expand All @@ -50,15 +167,12 @@ Error reading fitted param names from sample csv file <filename.csv>

The parameter values of the `fitted_params` are on the
constrained scale and must obey all constraints.
For example, if we modify the contencts of the first
For example, if we modify the contents of the first
reported draw in `bernoulli_fit.csv` so that the value
of `theta` is outside the declared bounds `real<lower=0, upper=1>`,
the program will return the following error message:

```
Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] (in 'bernoulli_ppc.stan', line 5, column 2 to column 30)
Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] \
(in 'bernoulli_ppc.stan', line 5, column 2 to column 30)
```




Loading

0 comments on commit 262f049

Please sign in to comment.