More divergences with composed warmup-then-sample than combined warmup-and-sample #966

mike-lawrence · 2021-01-02T21:14:02Z

Take the 8schools_ncp example:

library(cmdstanr)
example_program = 'schools_ncp.stan'
example_data = 'schools_ncp.data.json'
tmp <- file.path(tempdir(), example_program)
if (!file.exists(tmp)) {
  file.copy(system.file(example_program, package = "cmdstanr"), tmp)
}
mod <- cmdstan_model(tmp)
data_file <- system.file(example_data, package = "cmdstanr")

seed = 1
iter_warmup = 1e3
iter_sample = 1e3
parallel_chains = 4

#first the typical combined warmup-and-sample run:
both = mod$sample(
  data = data_file
  , chains = parallel_chains
  , parallel_chains = parallel_chains
  , refresh = 0
  , show_messages = F
  , seed = seed
  , iter_warmup = iter_warmup
  , iter_sampling = iter_sample
)

Which yields no divergences. Now an attempt at doing warmup and sampling separately, using the adapted info from the former in the latter:

warmup = mod$sample(
  data = data_file
  , chains = parallel_chains
  , parallel_chains = parallel_chains
  , refresh = 0
  , show_messages = F
  , seed = seed
  , iter_warmup = iter_warmup
  , save_warmup = T #for inits
  , sig_figs = 18
  , iter_sampling = 0
)

get_sampling_inits_from_warmup = function(chain_id){
	warmup_draws = warmup$draws(inc_warmup=T)
	final_warmup_value = warmup_draws[iter_warmup,chain_id,]
	init_list = as.list(final_warmup_value)
	names(init_list) = dimnames(final_warmup_value)[[3]]
	init_list = init_list[names(init_list)!='lp__']
	return(init_list)
}

samples = mod$sample(
  data = data_file
  , chains = parallel_chains
  , parallel_chains = parallel_chains
  , refresh = 0
  , show_messages = F
  , seed = seed
  , iter_warmup = 0
  , adapt_engaged = FALSE
  , inv_metric = warmup$inv_metric(matrix=F)
  , step_size = warmup$metadata()$step_size_adaptation
  , iter_sampling = iter_sample
  , init = get_sampling_inits_from_warmup
)

And now we get divergences from the sampling run. I have more rigorous testing of this detailed here, but the gist is that doing the warmup-then-sample approach is somehow generating more divergences than the combined warmup-and-sample approach despite to all examination all the inputs being as expected. I don't believe this is an issue with cmdstanr's passing of the pertinent info to cmdstan because I've checked the init jsons and output csvs and they have all the expected content.

mike-lawrence · 2021-01-03T02:56:56Z

Actually, here's some data and a model that seems to more reliably show the difference:
divergence_demo.zip

And here's a script to loop over lots of seeds, showing that with nearly every one the number of divergences in the composed approach is far higher than those in the standard approach:
composed_divergences.zip

mike-lawrence · 2021-01-03T04:16:20Z

And here's the functions I used to confirm that the inputs are as expected in the csv files (implying that this is not a cmdstanr problem):

check_inputs = function(chain){
	sampling_step_size = as.numeric(strsplit(readLines(samples$output_files()[chain])[42],'=')[[1]][2])
	warmup_step_size = warmup$metadata()$step_size_adaptation[chain]
	step_size_ok = dplyr::near(
		sampling_step_size
		, warmup_step_size
	)
	
	temp = readLines(samples$output_files()[chain])[44]
	temp = str_replace(temp,'#','')
	sampling_inv_metric = as.numeric(unlist(strsplit(temp,',',)))
	warmup_inv_metric = unlist(warmup$inv_metric(matrix=F)[chain])
	inv_metric_ok = all(dplyr::near(
		sampling_inv_metric
		, warmup_inv_metric
	))
	
	samples_inits = 
		(
			tibble::tibble(
				v = readLines(samples$metadata()$init[chain])
			)
			%>% dplyr::slice(2:(n()-1))
			%>% tidyr::separate(
				v
				, into = c('par','value')
				, sep = ':'
			)
			%>% dplyr::mutate(
				value = as.numeric(stringr::str_replace(value,',',''))
			)
			%>% dplyr::pull(value)
		)
	warmup_inits = as.numeric(get_sampling_inits_from_warmup(chain))
	inits_ok = all(dplyr::near(samples_inits,warmup_inits))
	return(tibble::tibble(chain,step_size_ok,inv_metric_ok,inits_ok))
}

purrr::map_dfr(1:parallel_chains,check_inputs)

mitzimorris · 2021-01-03T16:37:25Z

yes, this happens. it's not a bug - it's an artifact of the sampler trying to take the specified initial parameters values and meet model constraints. also, even when specifying the seed, the RNG is in a different state.

mitzimorris · 2021-01-03T19:26:13Z

reopening - @bbbales2 et al can investigate.

mike-lawrence · 2021-01-03T20:31:50Z

Here's a gist containing the stan file, the code to generate data and the code to loop over seeds, generating new data and attempting to sample in the two ways for each seed.

Data and model are a standard hierarchical model where there are 10 subjects each observed with common gaussian measurement noise 10 times in each of 2 conditions, and the subjects' intercept and condition effect are modelled as multivariate normal.

mike-lawrence · 2021-01-03T20:33:39Z

Huh, running that just crashed my rstudio, but only once it was about halfway done. I'll try again and see if it continues to occur...

mike-lawrence · 2021-01-03T21:01:54Z

Ran it a few times now and no more crashes, so possibly the crash was unrelated (though I didn't have much else happening on my computer at the time of the crash; plenty of free RAM and CPU).

mike-lawrence · 2021-01-04T01:05:34Z

Note: confirmed to occur with cmdstanpy as well as cmdstanr, reinforcing that this is core to cmdstan itself. I also confirmed it persists when using the development version of cmdstan.

ahartikainen · 2021-01-04T06:43:56Z

Should we extract the cmd used for calling cmdstan?

In cmdstanpy

fit.runset.cmd

mitzimorris · 2021-01-04T15:55:55Z

there's a slight complication because the command is composed on the fly w/ the chain id - cmdstan_call(chain_id)
or something like that

mike-lawrence · 2021-01-13T15:08:57Z

@mitzimorris @jgabry @rok-cesnovar Tagging you as core interface devs. @bbbales2 Investigated this, using my hierarchical model where I observe the most reliable increased divergences rates. He confirmed that the interfaces are creating an init json with the proper info, but found that if he tweaked the structure of these jsons the increased-divergences behaviour goes away.

Here's what both interfaces produce for the inits json:

{
  "chol_corr[1,1]": 1,
  "chol_corr[2,1]": -0.469573042026972,
  "chol_corr[1,2]": 0,
  "chol_corr[2,2]": 0.882893627908559,
  "mean_coef_[1]": -0.0305049133852781,
  "mean_coef_[2]": 0.164671727959854,
  "sd_coef_[1]": 0.00665194669853791,
  "sd_coef_[2]": 0.148205452770937,
  "multi_normal_helper[1,1]": -0.0216987316133846,
  "multi_normal_helper[2,1]": 0.724235080770058,
  "multi_normal_helper[1,2]": 0.43239986307413,
  "multi_normal_helper[2,2]": 1.75514903811538,
  "multi_normal_helper[1,3]": -0.691867572248652,
  "multi_normal_helper[2,3]": -2.66409204621652,
  "multi_normal_helper[1,4]": -0.395599105436074,
  "multi_normal_helper[2,4]": -0.231732460746632,
  "multi_normal_helper[1,5]": 0.439963477396995,
  "multi_normal_helper[2,5]": -0.460156658379562,
  "multi_normal_helper[1,6]": 1.46417143002824,
  "multi_normal_helper[2,6]": -1.2165899374329,
  "multi_normal_helper[1,7]": 1.54988112300058,
  "multi_normal_helper[2,7]": -0.90213563970599,
  "multi_normal_helper[1,8]": 0.740397448247888,
  "multi_normal_helper[2,8]": 1.32996651536118,
  "multi_normal_helper[1,9]": 0.456738742343979,
  "multi_normal_helper[2,9]": -0.306703562089387,
  "multi_normal_helper[1,10]": -0.530873450669662,
  "multi_normal_helper[2,10]": 1.49202297709515,
  "noise_": 0.981719532981492
}

And here's the alternative structure that he observes as solving the issue:

{
    "chol_corr": [[1, 0], [-0.469573042026972, 0.882893627908559]],
    "mean_coef_": [-0.0305049133852781, 0.164671727959854],
    "sd_coef_": [0.00665194669853791, 0.148205452770937],
    "multi_normal_helper": [[-0.0216987316133846, 0.43239986307413, -0.691867572248652, -0.395599105436074, 0.439963477396995, 1.46417143002824, 1.54988112300058, 0.740397448247888, 0.456738742343979, -0.530873450669662],
			    [0.724235080770058, 1.75514903811538, -2.66409204621652, -0.231732460746632, -0.460156658379562, -1.2165899374329, -0.90213563970599, 1.32996651536118, -0.306703562089387, 1.49202297709515]],
    "noise_": 0.981719532981492
}

So is this something that needs to be addressed in the interfaces, or in cmdstan?

rok-cesnovar · 2021-01-13T16:04:49Z

In that its a bug in the interfaces. The below example is how it should be.

ahartikainen · 2021-01-13T16:17:19Z

At least CmdStanPy doesn't create multidimensional objects yet.
Is there possibility to fix this in CmdStan side?

rok-cesnovar · 2021-01-13T16:35:20Z

If we change this in Cmdstan that will not be backwards incompatible.

bbbales2 · 2021-01-13T16:52:13Z

Maybe we could print warnings about unused elements of the JSON in cmdstan? That way the first json would at least emit a ton of messages and someone would know something is up.

These initial conditions have always been a bit annoying -- there's no part of the process that gives you feedback and says you set your inits right, which always leaves me wondering.

At least CmdStanPy doesn't create multidimensional objects yet.

Is this difficulty from parsing output draws and then trying to format them back into multidimensional objects again?

rok-cesnovar · 2021-01-13T17:16:41Z

Maybe we could print warnings about unused elements of the JSON in cmdstan?

We need to warn that only a part of a parameter is being initialized from supplied data. But we can only do that in cmdstan and it probably isnt easy.

Is this difficulty from parsing output draws and then trying to format them back into multidimensional objects again?

Yes, essentially the extract functionality of rstan is missing. If you put a manually constructed matrix to init in a sample call it will work fine. I am going to close this and we can figure it out in cmdstanr/py, though we already have those issues open for it, they just might need a bigger priority.

mitzimorris · 2021-01-13T20:25:43Z

In that its a bug in the interfaces. The below example is how it should be.

could @bbbales2 or @rok-cesnovar explain exactly what the bug in the interfaces is here?
what elements of JSON are unused and why?

bbbales2 · 2021-01-13T20:32:56Z

I don't think it's a bug so much as we should probably give some better feedback about what is happening.
If someone is trying to initialize the parameter chol_corr in their model and they supply this:

{
  "chol_corr[1,1]": 1,
  "chol_corr[2,1]": -0.469573042026972,
  "chol_corr[1,2]": 0,
  "chol_corr[2,2]": 0.882893627908559,
}

cmdstan silently accepts it but still keeps random initialization for chol_corr (it doesn't recognize that the user is actually trying to initialize chol_corr).

If you supply this instead it will work:

{
    "chol_corr": [[1, 0], [-0.469573042026972, 0.882893627908559]],
}

bbbales2 · 2021-01-13T20:34:21Z

And the reason I think we should give better feedback is it took us a lot of time to figure out what was going wrong here, even if we were doing something that, by the book, was simply wrong. Hard to debug.

mike-lawrence · 2021-01-13T22:17:39Z

Certainly I think that if inits are supplied but are found to be in an unexpected format, a warning should be emitted rather than silently using new random inits.

FYI, here's a super kludgey update to my get_inits() function that at least makes things work for vectors and matrices (so far as I've tested, which admittedly hasn't been much yet):

get_inits = function(chain_id){
  warmup_draws = warmup$draws(inc_warmup=T)
  final_warmup_value = warmup_draws[iter_warmup,chain_id,2:(dim(warmup_draws)[3])]
  (
    final_warmup_value
    %>% tibble::as_tibble(
      .name_repair = function(names){
        dimnames(final_warmup_value)$variable
      }
    )
    %>% tidyr::pivot_longer(cols=dplyr::everything())
    %>% tidyr::separate(
      name
      , into = c('variable','index')
      , sep = "\\["
      , fill = 'right'
    )
    %>% dplyr::group_split(variable)
    %>% purrr::map(
      .f = function(x){
        (
          x
          %>% dplyr::mutate(
            index = stringr::str_replace(index,']','')
          )
          %>% tidyr::separate(
            index
            , into = c('first','second')
            , sep = ","
            , fill = 'right'
            , convert = T
          )
          %>% dplyr::arrange(first,second)
          %>% (function(x){
            out = list()
            if(all(is.na(x$second))){
              out[[1]] = x$value
            }else{
              out[[1]] = matrix(
                x$value
                , nrow = max(x$first)
                , ncol = max(x$second)
              )
            }
            names(out) = x$variable[1]
            return(out)
          })
        )         
      }
    )
    %>% unlist(recursive=F)
    %>% return()
  )
}

mitzimorris · 2021-01-14T00:18:29Z

many thanks, now I understand.
this is a fundamental problem for the following reasons:

there is no requirement that the user supply initial parameter values for all parameters
the stan::io::varcontext object can contain extra variables
the CmdStan interface doesn't provide a way to interact with the instantiated model directly - if it did, then it would be possible to follow the approach taken by standalone generated quantities to identify the model parameters.

the best way to address this problem is better documentation - I think that's what we're suggesting here, correct?

bbbales2 · 2021-01-14T02:16:02Z

Wow, that combination of #2 and #3 is a problem. Ouch ouch ouch. I'd like some sort of more active fix than docs (since it tricked me).

Best I can think of off the cuff is cmdstanr/cmdstanpy error if a variable name created in the inits isn't a valid Stan variable name. For instance a[5] can't be a Stan variable name.

mitzimorris · 2021-01-14T04:41:27Z

Best I can think of off the cuff is cmdstanr/cmdstanpy error if a variable name created in the inits isn't a valid Stan variable name. For instance a[5] can't be a Stan variable name.

brilliant and easy!

mitzimorris closed this as completed Jan 3, 2021

mitzimorris reopened this Jan 3, 2021

rok-cesnovar closed this as completed Jan 13, 2021

This was referenced Feb 16, 2021

initial values ignored stan-dev/cmdstanr#451

Closed

Add check for inits named with square brackets stan-dev/cmdstanr#452

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More divergences with composed warmup-then-sample than combined warmup-and-sample #966

More divergences with composed warmup-then-sample than combined warmup-and-sample #966

mike-lawrence commented Jan 2, 2021 •

edited

Loading

mike-lawrence commented Jan 3, 2021 •

edited

Loading

mike-lawrence commented Jan 3, 2021 •

edited

Loading

mitzimorris commented Jan 3, 2021

mitzimorris commented Jan 3, 2021

mike-lawrence commented Jan 3, 2021

mike-lawrence commented Jan 3, 2021

mike-lawrence commented Jan 3, 2021

mike-lawrence commented Jan 4, 2021

ahartikainen commented Jan 4, 2021

mitzimorris commented Jan 4, 2021

mike-lawrence commented Jan 13, 2021

rok-cesnovar commented Jan 13, 2021

ahartikainen commented Jan 13, 2021

rok-cesnovar commented Jan 13, 2021 •

edited

Loading

bbbales2 commented Jan 13, 2021

rok-cesnovar commented Jan 13, 2021

mitzimorris commented Jan 13, 2021

bbbales2 commented Jan 13, 2021

bbbales2 commented Jan 13, 2021

mike-lawrence commented Jan 13, 2021

mitzimorris commented Jan 14, 2021

bbbales2 commented Jan 14, 2021

mitzimorris commented Jan 14, 2021

More divergences with composed warmup-then-sample than combined warmup-and-sample #966

More divergences with composed warmup-then-sample than combined warmup-and-sample #966

Comments

mike-lawrence commented Jan 2, 2021 • edited Loading

mike-lawrence commented Jan 3, 2021 • edited Loading

mike-lawrence commented Jan 3, 2021 • edited Loading

mitzimorris commented Jan 3, 2021

mitzimorris commented Jan 3, 2021

mike-lawrence commented Jan 3, 2021

mike-lawrence commented Jan 3, 2021

mike-lawrence commented Jan 3, 2021

mike-lawrence commented Jan 4, 2021

ahartikainen commented Jan 4, 2021

mitzimorris commented Jan 4, 2021

mike-lawrence commented Jan 13, 2021

rok-cesnovar commented Jan 13, 2021

ahartikainen commented Jan 13, 2021

rok-cesnovar commented Jan 13, 2021 • edited Loading

bbbales2 commented Jan 13, 2021

rok-cesnovar commented Jan 13, 2021

mitzimorris commented Jan 13, 2021

bbbales2 commented Jan 13, 2021

bbbales2 commented Jan 13, 2021

mike-lawrence commented Jan 13, 2021

mitzimorris commented Jan 14, 2021

bbbales2 commented Jan 14, 2021

mitzimorris commented Jan 14, 2021

mike-lawrence commented Jan 2, 2021 •

edited

Loading

mike-lawrence commented Jan 3, 2021 •

edited

Loading

mike-lawrence commented Jan 3, 2021 •

edited

Loading

rok-cesnovar commented Jan 13, 2021 •

edited

Loading