Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain data and parameter names from the model + checks for data and init arguments #430

Closed
avehtari opened this issue Jan 7, 2021 · 4 comments · Fixed by #541
Closed
Labels
feature New feature or request

Comments

@avehtari
Copy link
Contributor

avehtari commented Jan 7, 2021

There is an issue for cmdstan to provide functionality to get list of data and parameter names stan-dev/cmdstan#887

When that functionality is available it would be useful to have

  1. methods for obtaining lists of data and parameter names
    • these would help to make the workflow more robust in general
    • for example, methods for obtaining list of parameter names can be used to make it easier to generate inits without forgetting some parameters
    • for example, methods for obtaining list of parameter names can be used to report summaries just for parameters and not for generated quantities
  2. have checks that data and inits match the model data and parameter names
  • the data is eventually check when the model is instantiated, but currently there is no message whether inits match
  • For example, it would be good to compare the names provided in init argument and the actual parameters and provide informational message, e.g. 3 of 5 parameters initialized via init argument and 2 of 5 parameters initialized with random values.
@avehtari avehtari added the feature New feature or request label Jan 7, 2021
@rok-cesnovar
Copy link
Member

Thanks!

Other uses of this are:

  • enables earlier checks that all data variables are present with a more informative message than what cmdstan currently outputs
  • enables solving Regex parameter selection #377 in cmdstan or by the user on their own

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Feb 15, 2021

The required stanc3 code is now in develop. --info on stanc3 will output

{ "inputs": { "a": { "type": "int", "dimensions": 0},
              "b": { "type": "real", "dimensions": 0},
              "c": { "type": "real", "dimensions": 1},
              "d": { "type": "real", "dimensions": 1},
              "e": { "type": "real", "dimensions": 2},
              "f": { "type": "int", "dimensions": 1},
              "g": { "type": "real", "dimensions": 1},
              "h": { "type": "real", "dimensions": 2},
              "i": { "type": "real", "dimensions": 3},
              "j": { "type": "int", "dimensions": 3} },
  "parameters": { "l": { "type": "real", "dimensions": 1},
                  "m": { "type": "real", "dimensions": 1},
                  "n": { "type": "real", "dimensions": 1},
                  "o": { "type": "real", "dimensions": 1},
                  "p": { "type": "real", "dimensions": 2},
                  "q": { "type": "real", "dimensions": 2},
                  "r": { "type": "real", "dimensions": 2},
                  "s": { "type": "real", "dimensions": 2},
                  "y": { "type": "real", "dimensions": 0} },
  "transformed parameters": { "t": { "type": "real", "dimensions": 2} },
  "generated quantities": { "u": { "type": "real", "dimensions": 0} },

WIll close this issue once we make a "prepare stuff for 2.27" issue.

@rok-cesnovar rok-cesnovar added this to the after cmdstan 2.27 milestone Mar 17, 2021
@rok-cesnovar
Copy link
Member

rok-cesnovar commented Aug 25, 2021

Update for this issue.

Once you compile a model (example model used -> https://github.com/stan-dev/cmdstanr/blob/master/inst/logistic.stan):

library(cmdstanr)

stan_file <- file.path(path.package("cmdstanr"), "logistic.stan")
data_file <- file.path(path.package("cmdstanr"), "logistic.data.json")

mod <- cmdstan_model(stan_file)

you can get the list of parameters, transformed parameters and generated quantities as well as data with mod$variables().

for example, methods for obtaining list of parameter names can be used to make it easier to generate inits without forgetting some parameters

> names(mod$variables()$parameters)
[1] "alpha" "beta" 

for example, methods for obtaining list of parameter names can be used to report summaries just for parameters and not for generated quantities

> fit$summary(names(mod$variables()$parameters))
# A tibble: 4 × 10
  variable   mean median    sd   mad      q5     q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 alpha     0.380  0.379 0.218 0.215  0.0266  0.739   1.00    4422.    3049.
2 beta[1]  -0.665 -0.662 0.241 0.242 -1.06   -0.273   1.00    3502.    3056.
3 beta[2]  -0.279 -0.274 0.224 0.227 -0.654   0.0894  1.00    4122.    2992.
4 beta[3]   0.680  0.676 0.272 0.267  0.231   1.15    1.00    3749.    2856.

enables earlier checks that all data variables are present with a more informative message than what cmdstan currently outputs

This is now improved by not starting the cmdstan exe and rather stopping earlier with at least to me a more informative message.

> fit <- mod$sample(data = list("N" = 100, "K" = 3))
 Error: Missing input data for the following data variables: y, X. 

@rok-cesnovar
Copy link
Member

The only thing missing before we close this issue is adding checks to inits:

the data is eventually check when the model is instantiated, but currently there is no message whether inits match
For example, it would be good to compare the names provided in init argument and the actual parameters and provide informational message, e.g. 3 of 5 parameters initialized via init argument and 2 of 5 parameters initialized with random values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants