-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More divergences with composed warmup-then-sample than combined warmup-and-sample #966
Comments
Actually, here's some data and a model that seems to more reliably show the difference: And here's a script to loop over lots of seeds, showing that with nearly every one the number of divergences in the composed approach is far higher than those in the standard approach: |
And here's the functions I used to confirm that the inputs are as expected in the csv files (implying that this is not a cmdstanr problem): check_inputs = function(chain){
sampling_step_size = as.numeric(strsplit(readLines(samples$output_files()[chain])[42],'=')[[1]][2])
warmup_step_size = warmup$metadata()$step_size_adaptation[chain]
step_size_ok = dplyr::near(
sampling_step_size
, warmup_step_size
)
temp = readLines(samples$output_files()[chain])[44]
temp = str_replace(temp,'#','')
sampling_inv_metric = as.numeric(unlist(strsplit(temp,',',)))
warmup_inv_metric = unlist(warmup$inv_metric(matrix=F)[chain])
inv_metric_ok = all(dplyr::near(
sampling_inv_metric
, warmup_inv_metric
))
samples_inits =
(
tibble::tibble(
v = readLines(samples$metadata()$init[chain])
)
%>% dplyr::slice(2:(n()-1))
%>% tidyr::separate(
v
, into = c('par','value')
, sep = ':'
)
%>% dplyr::mutate(
value = as.numeric(stringr::str_replace(value,',',''))
)
%>% dplyr::pull(value)
)
warmup_inits = as.numeric(get_sampling_inits_from_warmup(chain))
inits_ok = all(dplyr::near(samples_inits,warmup_inits))
return(tibble::tibble(chain,step_size_ok,inv_metric_ok,inits_ok))
}
purrr::map_dfr(1:parallel_chains,check_inputs)
|
yes, this happens. it's not a bug - it's an artifact of the sampler trying to take the specified initial parameters values and meet model constraints. also, even when specifying the seed, the RNG is in a different state. |
reopening - @bbbales2 et al can investigate. |
Here's a gist containing the stan file, the code to generate data and the code to loop over seeds, generating new data and attempting to sample in the two ways for each seed. Data and model are a standard hierarchical model where there are 10 subjects each observed with common gaussian measurement noise 10 times in each of 2 conditions, and the subjects' intercept and condition effect are modelled as multivariate normal. |
Huh, running that just crashed my rstudio, but only once it was about halfway done. I'll try again and see if it continues to occur... |
Ran it a few times now and no more crashes, so possibly the crash was unrelated (though I didn't have much else happening on my computer at the time of the crash; plenty of free RAM and CPU). |
Note: confirmed to occur with |
Should we extract the cmd used for calling cmdstan? In cmdstanpy
|
there's a slight complication because the command is composed on the fly w/ the chain id - |
@mitzimorris @jgabry @rok-cesnovar Tagging you as core interface devs. @bbbales2 Investigated this, using my hierarchical model where I observe the most reliable increased divergences rates. He confirmed that the interfaces are creating an init json with the proper info, but found that if he tweaked the structure of these jsons the increased-divergences behaviour goes away. Here's what both interfaces produce for the inits json: {
"chol_corr[1,1]": 1,
"chol_corr[2,1]": -0.469573042026972,
"chol_corr[1,2]": 0,
"chol_corr[2,2]": 0.882893627908559,
"mean_coef_[1]": -0.0305049133852781,
"mean_coef_[2]": 0.164671727959854,
"sd_coef_[1]": 0.00665194669853791,
"sd_coef_[2]": 0.148205452770937,
"multi_normal_helper[1,1]": -0.0216987316133846,
"multi_normal_helper[2,1]": 0.724235080770058,
"multi_normal_helper[1,2]": 0.43239986307413,
"multi_normal_helper[2,2]": 1.75514903811538,
"multi_normal_helper[1,3]": -0.691867572248652,
"multi_normal_helper[2,3]": -2.66409204621652,
"multi_normal_helper[1,4]": -0.395599105436074,
"multi_normal_helper[2,4]": -0.231732460746632,
"multi_normal_helper[1,5]": 0.439963477396995,
"multi_normal_helper[2,5]": -0.460156658379562,
"multi_normal_helper[1,6]": 1.46417143002824,
"multi_normal_helper[2,6]": -1.2165899374329,
"multi_normal_helper[1,7]": 1.54988112300058,
"multi_normal_helper[2,7]": -0.90213563970599,
"multi_normal_helper[1,8]": 0.740397448247888,
"multi_normal_helper[2,8]": 1.32996651536118,
"multi_normal_helper[1,9]": 0.456738742343979,
"multi_normal_helper[2,9]": -0.306703562089387,
"multi_normal_helper[1,10]": -0.530873450669662,
"multi_normal_helper[2,10]": 1.49202297709515,
"noise_": 0.981719532981492
} And here's the alternative structure that he observes as solving the issue: {
"chol_corr": [[1, 0], [-0.469573042026972, 0.882893627908559]],
"mean_coef_": [-0.0305049133852781, 0.164671727959854],
"sd_coef_": [0.00665194669853791, 0.148205452770937],
"multi_normal_helper": [[-0.0216987316133846, 0.43239986307413, -0.691867572248652, -0.395599105436074, 0.439963477396995, 1.46417143002824, 1.54988112300058, 0.740397448247888, 0.456738742343979, -0.530873450669662],
[0.724235080770058, 1.75514903811538, -2.66409204621652, -0.231732460746632, -0.460156658379562, -1.2165899374329, -0.90213563970599, 1.32996651536118, -0.306703562089387, 1.49202297709515]],
"noise_": 0.981719532981492
} So is this something that needs to be addressed in the interfaces, or in cmdstan? |
In that its a bug in the interfaces. The below example is how it should be. |
At least CmdStanPy doesn't create multidimensional objects yet. |
If we change this in Cmdstan that will not be backwards incompatible. |
Maybe we could print warnings about unused elements of the JSON in cmdstan? That way the first json would at least emit a ton of messages and someone would know something is up. These initial conditions have always been a bit annoying -- there's no part of the process that gives you feedback and says you set your inits right, which always leaves me wondering.
Is this difficulty from parsing output draws and then trying to format them back into multidimensional objects again? |
We need to warn that only a part of a parameter is being initialized from supplied data. But we can only do that in cmdstan and it probably isnt easy.
Yes, essentially the extract functionality of rstan is missing. If you put a manually constructed matrix to init in a sample call it will work fine. I am going to close this and we can figure it out in cmdstanr/py, though we already have those issues open for it, they just might need a bigger priority. |
could @bbbales2 or @rok-cesnovar explain exactly what the bug in the interfaces is here? |
I don't think it's a bug so much as we should probably give some better feedback about what is happening.
cmdstan silently accepts it but still keeps random initialization for If you supply this instead it will work:
|
And the reason I think we should give better feedback is it took us a lot of time to figure out what was going wrong here, even if we were doing something that, by the book, was simply wrong. Hard to debug. |
Certainly I think that if inits are supplied but are found to be in an unexpected format, a warning should be emitted rather than silently using new random inits. FYI, here's a super kludgey update to my get_inits = function(chain_id){
warmup_draws = warmup$draws(inc_warmup=T)
final_warmup_value = warmup_draws[iter_warmup,chain_id,2:(dim(warmup_draws)[3])]
(
final_warmup_value
%>% tibble::as_tibble(
.name_repair = function(names){
dimnames(final_warmup_value)$variable
}
)
%>% tidyr::pivot_longer(cols=dplyr::everything())
%>% tidyr::separate(
name
, into = c('variable','index')
, sep = "\\["
, fill = 'right'
)
%>% dplyr::group_split(variable)
%>% purrr::map(
.f = function(x){
(
x
%>% dplyr::mutate(
index = stringr::str_replace(index,']','')
)
%>% tidyr::separate(
index
, into = c('first','second')
, sep = ","
, fill = 'right'
, convert = T
)
%>% dplyr::arrange(first,second)
%>% (function(x){
out = list()
if(all(is.na(x$second))){
out[[1]] = x$value
}else{
out[[1]] = matrix(
x$value
, nrow = max(x$first)
, ncol = max(x$second)
)
}
names(out) = x$variable[1]
return(out)
})
)
}
)
%>% unlist(recursive=F)
%>% return()
)
} |
many thanks, now I understand.
the best way to address this problem is better documentation - I think that's what we're suggesting here, correct? |
Wow, that combination of #2 and #3 is a problem. Ouch ouch ouch. I'd like some sort of more active fix than docs (since it tricked me). Best I can think of off the cuff is cmdstanr/cmdstanpy error if a variable name created in the inits isn't a valid Stan variable name. For instance |
brilliant and easy! |
Take the 8schools_ncp example:
Which yields no divergences. Now an attempt at doing warmup and sampling separately, using the adapted info from the former in the latter:
And now we get divergences from the sampling run. I have more rigorous testing of this detailed here, but the gist is that doing the warmup-then-sample approach is somehow generating more divergences than the combined warmup-and-sample approach despite to all examination all the inputs being as expected. I don't believe this is an issue with cmdstanr's passing of the pertinent info to cmdstan because I've checked the init jsons and output csvs and they have all the expected content.
The text was updated successfully, but these errors were encountered: