Add `$variables()` #519

rok-cesnovar · 2021-06-18T18:41:21Z

Summary

This PR will attempt to add $variables(). First we need to settle what will be the args (if any) and what it will return.

Current suggestion is:

library(cmdstanr)

code <- "
data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0,upper=1> y[N];
  matrix[N, K] X;
}
parameters {
  real alpha;
  vector[K] beta;
}
model {
  target += normal_lpdf(alpha | 0, 1);
  target += normal_lpdf(beta | 0, 1);
  target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) log_lik[n] = bernoulli_logit_lpmf(y[n] | alpha + X[n] * beta);
}
"

mod <- cmdstan_model(write_stan_file(code))

mod$variables()
$data
$data$N
$data$N$type
[1] "int"

$data$N$dimensions
[1] 0


$data$K
$data$K$type
[1] "int"

$data$K$dimensions
[1] 0


$data$y
$data$y$type
[1] "int"

$data$y$dimensions
[1] 1


$data$X
$data$X$type
[1] "real"

$data$X$dimensions
[1] 2



$parameters
$parameters$alpha
$parameters$alpha$type
[1] "real"

$parameters$alpha$dimensions
[1] 0


$parameters$beta
$parameters$beta$type
[1] "real"

$parameters$beta$dimensions
[1] 1



$transformed_parameters
NULL

$generated_quantities
$generated_quantities$log_lik
$generated_quantities$log_lik$type
[1] "real"

$generated_quantities$log_lik$dimensions
[1] 1

> names(mod$variables())
[1] "data"                   "parameters"             "transformed_parameters" "generated_quantities"  
> names(mod$variables()$data)
[1] "N" "K" "y" "X"

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Rok Češnovar

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

codecov-commenter · 2021-06-18T19:02:28Z

Codecov Report

Merging #519 (e6646f8) into master (0872512) will decrease coverage by 1.12%.
The diff coverage is 96.00%.

❗ Current head e6646f8 differs from pull request most recent head fa98252. Consider uploading reports for the commit fa98252 to get more accurate results

@@            Coverage Diff             @@
##           master     #519      +/-   ##
==========================================
- Coverage   92.77%   91.65%   -1.13%     
==========================================
  Files          12       12              
  Lines        3047     3031      -16     
==========================================
- Hits         2827     2778      -49     
- Misses        220      253      +33

Impacted Files	Coverage Δ
R/model.R	`91.20% <95.83%> (-0.54%)`	⬇️
R/csv.R	`98.21% <100.00%> (-0.45%)`	⬇️
R/install.R	`63.27% <0.00%> (-5.56%)`	⬇️
R/run.R	`94.07% <0.00%> (-1.65%)`	⬇️
R/utils.R	`90.71% <0.00%> (-1.43%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0872512...fa98252. Read the comment docs.

jgabry · 2021-06-18T20:01:02Z

First we need to settle what will be the args (if any) and what it will return.

I like your current suggestion. Just a few thoughts:

Will the use of the name "dimensions" here be confusing? For example if we have vector[100] x then fit$metadata()$stan_variable_dims$x would say 100 but variables() would say "dimensions" is 1. Both make sense, but they're using "dimensions" (or "dims" in the case of metadata()) to mean two different things.
I don't think we absolutely need any arguments but we could add an argument block. That way we could do mod$variables("parameters") instead of mod$variables()$parameters. But that doesn't really matter too much.

And a few questions more related to CmdStan and stanc3 than CmdStanR:

Is the "--info" option documented anywhere for stanc3? I couldn't find it so I opened an issue: CmdStan: New command line option in 2.27 is not documented docs#372
Is there any way to get more detailed type information from stanc3? That is, if I have a simplex or correlation matrix, is it possible to get it to give me those types instead of just "real"?

rok-cesnovar · 2021-06-18T20:12:57Z

Will the use of the name "dimensions" here be confusing

could be yes.m, good call. Any other suggestions? Dimension_length? dims_length? Any other?

we could add an argument block

I like it!

Is the "--info" option documented anywhere for stanc3?

That is missing yes. Thanks for the issue.

Is there any way to get more detailed type information from stanc3?

not at the moment, but we can definitely add stuff to what —info returns.

jgabry · 2021-06-18T20:18:37Z

could be yes.m, good call. Any other suggestions? Dimension_length? dims_length? Any other?

Maybe n_dims or num_dims?

Alternatively, we could keep dimensions here and instead change stan_variable_dims to stan_variable_sizes or stan_variable_elements?

# Conflicts: # NEWS.md

rok-cesnovar · 2021-07-10T09:41:29Z

This is now ready for review finally. Changes:

removed the block arg
added basic docs
changed stan_variable_dims to stan_variable_sizes
reorganized the return a bit

rok-cesnovar · 2021-07-11T10:40:47Z

R/model.R

+  )
+  variables <- jsonlite::read_json(out_file, na = "null")
+  variables$data <- variables$inputs
+  variables$inputs <- NULL


I think data is a better name here than inputs.

rok-cesnovar · 2021-07-11T10:41:28Z

tests/testthat/test-model-variables.R

+  expect_equal(mod$variables()$parameters$theta$type, "real")
+  expect_equal(mod$variables()$parameters$theta$dimensions, 0)
+  expect_equal(length(mod$variables()[["transformed parameters"]]), 0)
+  expect_equal(length(mod$variables()[["generated quantities"]]), 0)


Should we rename these two to a name with an underscore?

mitzimorris · 2021-07-14T12:01:18Z

regarding names - CmdStanPy and CmdStanR have somewhat diverged on the methods for the CmdStanMCMC object
w/r/t the functions which return the draws for the sampler output variables.

I would love to come up with a good and consistent set of names for these functions on both the CmdStanModel and CmdStanMCMC object. the difference in dimensionality may be confusing to users - a Stan variable as declared in
the model has N dimensions, in the sampler output, it has N+1 dimensions because is an array of length draws where each
array element has N dimensions. that's as concisely as I can put it, but is it concise or confusing?

Also, CmdStanMCMC object in CmdStanPy makes a distinction between sampler_vars corresponding to lp__ and friends, and stan_vars which are only the model variables which are output by the write_array function, i.e., variables in the parameters, transformed parameters, and generated quantities block. OTOH, from the model stan_variables returns data and transformed data variables as well. Getting the names and dimensions of the data variables is useful. Getting the names and dimensions of the transformed data variables is not particularly useful. Should we make a distinction between data variables and output variables?

rok-cesnovar · 2021-07-14T12:09:53Z

that's as concisely as I can put it, but is it concise or confusing?

I think its concise.

Also, CmdStanMCMC object in CmdStanPy makes a distinction between sampler_vars corresponding to lp__ and friends, and stan_vars which are only the model variables which are output by the write_array function, i.e., variables in the parameters, transformed parameters, and generated quantities block.

Yes, CmdStanMCMC does this similarly, just with $draws() that returns draws for all the parameters/trans. parameters/GQ but also include lp__. The rest of the MCMC diagnostics are returned by sampler_diagnostics. At the time lp__ was considered to be more of a regular variable then a sampler diagnostic, which I tend to agree.

Should we make a distinction between data variables and output variables?

I agree that transformed data names are not relevant as they do not occur in the input nor the output, they are not present here.

The purpose of $variables() in CmdstanModel is to return the information on all the input or output model variables. So this does not return any values, that would be part of CmdstanMCMC, this is just to return the names, dimenionalities and types of the variables (whether they require integers or reals).

mitzimorris · 2021-07-14T12:32:56Z

The purpose of $variables() in CmdstanModel is to return the information on all the input or output model variables. So this does not return any values, that would be part of CmdstanMCMC, this is just to return the names, dimenionalities and types of the variables (whether they require integers or reals).

right, this is good and useful. as it's a method on the CmdStanModel object, the name model_variables would be both long and redundant, nonetheless, variables seems vague. what about splitting this into data_vars and output_vars or something like that?

rok-cesnovar · 2021-07-14T12:37:28Z

Sure, no issue in splitting, we discussed this in the issue I think as well.

Options:
a) $data_variables(), $output_variables()
b) $input_variables(), $output_variables()
c) $data_vars(), $output_vars()
d) $input_vars(), $output_vars()

Not a huge fan of the vars shorthand but seems that is where cmdstanpy is headed, so maybe we should as well. I dont have a strong preference though. @jgabry might have one?

mitzimorris · 2021-07-14T12:46:39Z

I'm happy to go with what whatever consensus we can find.
I have no objection to changing vars to variables and actually prefer the latter.
similarly, can see arguments for either data or input.
the inference methods all have argument data and the block is named data, but inasmuch as everything's data, calling it input data also makes sense.

discuss on forums?

until we get to 1.0, we can change names - but only if it's for the better.

jgabry · 2021-08-05T19:49:48Z

Sure, no issue in splitting, we discussed this in the issue I think as well.

Options:
a) $data_variables(), $output_variables()
b) $input_variables(), $output_variables()
c) $data_vars(), $output_vars()
d) $input_vars(), $output_vars()

Not a huge fan of the vars shorthand but seems that is where cmdstanpy is headed, so maybe we should as well. I dont have a strong preference though. @jgabry might have one?

Sorry for the delay on this! I totally missed these comments earlier.

I think I prefer variables over vars. And I slightly favor data instead of inputs.

jgabry · 2021-08-05T19:50:17Z

I think I prefer variables over vars. And I slightly favor data instead of inputs.

But neither preference is super super strong

mitzimorris · 2021-08-05T20:14:07Z

cmdstanpy has stan_variables and method_variables
the latter is for things like lp__
and stan_variables are the model variables that are written to the Stan CSV file - i.e.,
parameters, transformed parameters, and generated quantities.

mitzimorris · 2021-08-05T20:17:51Z

are data_variables always available? in CmdStanPy, if you reconstitute an object from a Stan CSV file, you don't have access to the input data.

jgabry · 2021-08-05T20:23:07Z

are data_variables always available? in CmdStanPy, if you reconstitute an object from a Stan CSV file, you don't have access to the input data.

Good point. I think this is true for CmdStanR too

rok-cesnovar · 2021-08-06T08:08:14Z

are data_variables always available?

mod$data_variables() and mod$output_variables() will only return the information on all the input and output model variables, not the actual values. So this will return names, dimensionalities and types. Given that its part of the model class, its not possible to return anything else.

Not sure why availability is the question here? Maybe I am misunderstanding, which is always an option :)

mitzimorris · 2021-08-06T11:21:15Z

OK, now I understand - these are methods on the model class. I think this is potentially confusing. it conflates the Stan language with the current behavior of the Stan platform I/O. also, given that the model code is available, why are these methods needed? but if we report Stan program variables, it should be in terms of the Stan block names - not `input` and `output`.

…

On Fri, Aug 6, 2021 at 4:08 AM Rok Češnovar ***@***.***> wrote: are data_variables always available? mod$data_variables() and mod$output_variables() will only return the information on all the input and output model variables, not the actual values. So this will return names, dimensionalities and types. Given that its part of the model class, its not possible to return anything else. Not sure why availability is the question here? Maybe I am misunderstanding, which is always an option :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#519 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJWT5BZKUQHJFI2K3NYTSDT3OJ7TANCNFSM466EOVAA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

jgabry · 2021-08-06T17:26:07Z

Not sure why availability is the question here? Maybe I am misunderstanding, which is always an option :)

No you're right. In my haste I got confused, which tends to happen :)

jgabry · 2021-08-06T17:28:45Z

given that the model code is available, why are these methods needed?

I'm not sure it's absolutely needed, but it can be helpful. For example, it would allow us to solve this: #513

mitzimorris · 2021-08-06T17:39:07Z

I'm not sure it's absolutely needed, but it can be helpful. For example, it would allow us to solve this: #513

~~if that's the case, then you should add methods. data_variables, parameter_variables, and generated_quantities variables. and maybe transformed_data and transformed_params too.~~

how about variables as function name, and arg block allowing folks to examine just the data block, etc.

rok-cesnovar · 2021-08-11T12:35:03Z

how about variables as function name, and arg block allowing folks to examine just the data block, etc.

Yes, this is basically what is currently implemented. I had the block arg initially, but I felt its redundant as one can simply do mod$variables()$data or mod$variables()[["data"]].

jgabry

Looks good, just the one comment about the missing transformed parameters.

jgabry · 2021-08-16T16:26:11Z

vignettes/cmdstanr-internals.Rmd

+```{r variable-type-dims}
+variables$data$J
+variables$data$sigma
+variables$transformed_parameters$theta


@rok-cesnovar When I run this I get NULL for the transformed parameters even though it should have theta. Do you get that too?

Yeah, sorry, should have used

variables$`transformed parameters` variables$`generated quantities` variables[["generated quantities"]]

This was leftover from an earlier version when I replaced the space with underscore. I am not sure which one is better. Space is more similar to the actual block name in the Stan model, underscore is easier to use with $. I am fine with either.

underscore is easier to use with $. I am fine with either.

I think I prefer the underscore if that's ok with you. Sorry for the extra work!

Not a problem at all. Done.

jgabry

Thanks! Changes look good. I'll go ahead and merge now.

rok-cesnovar added 3 commits June 5, 2021 11:57

add variables() and tests

94af0c2

add vignette section with example

4d75a1b

Merge branch 'master' into add_model_variables

f5dd06a

rok-cesnovar added 8 commits June 19, 2021 10:33

Merge branch 'master' into add_model_variables

b66bb07

Merge branch 'master' into add_model_variables

9815d29

added NEWS.md entry

bd2a737

change stan_variable_dims to stan_variable_sizes

394ac4c

add checking blocks

5d72368

Merge branch 'master' into add_model_variables

cd8f2eb

# Conflicts: # NEWS.md

fix sytnax

19fd348

add docs

b94aa0c

rok-cesnovar added 2 commits July 11, 2021 12:12

move to model_variables functi

a0dfd05

fixes

7c1f8fd

rok-cesnovar commented Jul 11, 2021

View reviewed changes

rok-cesnovar mentioned this pull request Jul 19, 2021

Length-one vectors are incorrectly read as single values #513

Closed

minor edit to vignette

ea3b67a

jgabry requested changes Aug 16, 2021

View reviewed changes

rok-cesnovar added 5 commits August 16, 2021 19:51

Merge branch 'master' into add_model_variables

d72a004

rebuild vignette

2721d15

rebuild again

fceae59

changed space to underscore in variable return list names

f28ce5e

update tests

fa98252

jgabry approved these changes Aug 17, 2021

View reviewed changes

jgabry merged commit 58f0980 into master Aug 17, 2021

jgabry deleted the add_model_variables branch August 17, 2021 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `$variables()` #519

Add `$variables()` #519

rok-cesnovar commented Jun 18, 2021

codecov-commenter commented Jun 18, 2021 •

edited

Loading

jgabry commented Jun 18, 2021

rok-cesnovar commented Jun 18, 2021

jgabry commented Jun 18, 2021

rok-cesnovar commented Jul 10, 2021 •

edited

Loading

rok-cesnovar Jul 11, 2021

rok-cesnovar Jul 11, 2021

mitzimorris commented Jul 14, 2021 •

edited

Loading

rok-cesnovar commented Jul 14, 2021

mitzimorris commented Jul 14, 2021 •

edited

Loading

rok-cesnovar commented Jul 14, 2021

mitzimorris commented Jul 14, 2021 •

edited

Loading

jgabry commented Aug 5, 2021

jgabry commented Aug 5, 2021

mitzimorris commented Aug 5, 2021

mitzimorris commented Aug 5, 2021

jgabry commented Aug 5, 2021

rok-cesnovar commented Aug 6, 2021

mitzimorris commented Aug 6, 2021 via email

jgabry commented Aug 6, 2021

jgabry commented Aug 6, 2021

mitzimorris commented Aug 6, 2021 •

edited

Loading

rok-cesnovar commented Aug 11, 2021

jgabry left a comment

jgabry Aug 16, 2021

rok-cesnovar Aug 16, 2021

rok-cesnovar Aug 16, 2021

jgabry Aug 16, 2021

rok-cesnovar Aug 17, 2021

jgabry left a comment

Add $variables() #519

Add $variables() #519

Conversation

rok-cesnovar commented Jun 18, 2021

Summary

Copyright and Licensing

codecov-commenter commented Jun 18, 2021 • edited Loading

Codecov Report

jgabry commented Jun 18, 2021

rok-cesnovar commented Jun 18, 2021

jgabry commented Jun 18, 2021

rok-cesnovar commented Jul 10, 2021 • edited Loading

rok-cesnovar Jul 11, 2021

Choose a reason for hiding this comment

rok-cesnovar Jul 11, 2021

Choose a reason for hiding this comment

mitzimorris commented Jul 14, 2021 • edited Loading

rok-cesnovar commented Jul 14, 2021

mitzimorris commented Jul 14, 2021 • edited Loading

rok-cesnovar commented Jul 14, 2021

mitzimorris commented Jul 14, 2021 • edited Loading

jgabry commented Aug 5, 2021

jgabry commented Aug 5, 2021

mitzimorris commented Aug 5, 2021

mitzimorris commented Aug 5, 2021

jgabry commented Aug 5, 2021

rok-cesnovar commented Aug 6, 2021

mitzimorris commented Aug 6, 2021 via email

jgabry commented Aug 6, 2021

jgabry commented Aug 6, 2021

mitzimorris commented Aug 6, 2021 • edited Loading

rok-cesnovar commented Aug 11, 2021

jgabry left a comment

Choose a reason for hiding this comment

jgabry Aug 16, 2021

Choose a reason for hiding this comment

rok-cesnovar Aug 16, 2021

Choose a reason for hiding this comment

rok-cesnovar Aug 16, 2021

Choose a reason for hiding this comment

jgabry Aug 16, 2021

Choose a reason for hiding this comment

rok-cesnovar Aug 17, 2021

Choose a reason for hiding this comment

jgabry left a comment

Choose a reason for hiding this comment

Add `$variables()` #519

Add `$variables()` #519

codecov-commenter commented Jun 18, 2021 •

edited

Loading

rok-cesnovar commented Jul 10, 2021 •

edited

Loading

mitzimorris commented Jul 14, 2021 •

edited

Loading

mitzimorris commented Jul 14, 2021 •

edited

Loading

mitzimorris commented Jul 14, 2021 •

edited

Loading

mitzimorris commented Aug 6, 2021 •

edited

Loading