Expose diagnostic check functions #205

rok-cesnovar · 2020-06-16T19:57:17Z

check_sampler_transitions_treedepth and check_divergences should be accessible to the user in some way.

Either separately or as part of fit$diagnose() or something with possibly other checks.

Cmdstan's diagnostics checks E-BMFI which is simple (only requires energy__ I think) and two that require reading all data (ESS and RHAT checks).

The text was updated successfully, but these errors were encountered:

jgabry · 2020-06-16T21:13:13Z

Yeah I agree we should expose these. I think we should also change the name of check_sampler_transitions_treedepth to just check_treedepth.

Maybe a method fit$diagnose(diagnostic = NULL), where if diagnostic=NULL it checks all of them but the user can also specify which ones, e.g. fit$diagnose(c("divergence", "ebfmi"))?

Also I think it would be useful if these methods returned the value of diagnostic to the user instead of only printing messages (it can also print messages).

rok-cesnovar · 2020-06-19T17:09:58Z

If stan-dev/posterior will have these checks (see stan-dev/posterior#77), we should just use those.

The divergences, treedepth and ebfmi use sampler diagnostics so those are Stan specific and will be done by our helper functions.

cc @paul-buerkner

jgabry · 2020-06-19T19:14:32Z

Yeah I agree

rok-cesnovar · 2020-06-30T18:39:44Z

What should we do here for now:

add fit$diagnose() that runs treedepth, divergence and ebmfi without rhat/ess
add fit$diagnose() that runs treedepth, divergence and ebmfi and also implement warnings for rhat/ess (at least the latter would get replaced by posterior)

This one should be simple.

jgabry · 2020-06-30T19:53:46Z

I think posterior is going to end up including warnings also for HMC/NUTS, so I think all of them could end up being replace by posterior. Do you think we should just wait for posterior or implement them and then replace them with posterior's versions?

rok-cesnovar · 2020-06-30T20:40:12Z

Lets just wait for posterior then and move the milestone if you agree.

jgabry · 2020-06-30T21:57:41Z

Sounds good

rok-cesnovar · 2021-05-15T17:11:07Z

So what I think the API for this should be:

add a diagnostics = c("divergences", "treedepth", "bfmi") argument to $sample()
deprecate validate_csv. I think the name was my idea. In hindsight its a bad name, not informative at all. validate_csv = FALSE is the same as diagnostics = NULL or ""
add fit$diagnose(diagnostics = c("divergences", "treedepth", "bfmi"))

I think we should just go with it now that we have the three sampler diagnostics checks. The rhat/eff checks can be added later.

jgabry · 2021-05-15T17:22:04Z

I like this idea. We should think carefully about what object fit$diagnose() should return (aside from what it prints/warns about). When posterior eventually implements these hmc diagnostics we should use them internally in fit$diagnose() but ideally not change what it returns so we don't have to deprecate it in favor of a new method that uses the posterior stuff.

martinmodrak · 2021-08-02T10:44:35Z

Just stumbled on this (calling CmdStan in a wrapper, so user never runs summary(), but don't want to swallow diagnostic warnings). May I suggest that we add a new method "$check_diagnostics()" that returns nothing, but can show warnings as a side-effect and not touch the $diagnose() (which IMHO is useful on its own)? That would feel cleanest to me (but I don't insist on it, just a different approach).

jgabry · 2021-11-03T17:57:54Z

@rok-cesnovar and anyone else interested, I have some time to work on this, but I wanted to get a bit of feedback first.

I have an idea for a method diagnose_sampler(), that both prints the warning messages about the diagnostics (if argument quiet=FALSE) and returns their values as a list. Right now it would include divergences, max_treedepths, and ebfmi (I'm planning to branch off of the branch from @jsocolar's ebfmi PR so that we have that diagnostic available).

For example, what do you think about these examples?

# here diagnose_sampler() prints the messages and returns a list 
# containing the values of the diagnostics 
# it could also have a `quiet` argument to turn off printing the messages. 
> fit$diagnose_sampler()

Warning: 283 of 4000 (7.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

Warning: 1 of 4 chains had energy-based Bayesian fraction of missing information (E-BFMI) less than 0.2.
This may indicate poor exploration of the posterior.

$divergences
[1] 283

$max_treedepths
[1] 0

$min_ebfmi
[1] 0.1841229

# using quiet=TRUE
> fit$diagnose_sampler(quiet = TRUE)
$divergences
[1] 283

$max_treedepths
[1] 0

$min_ebfmi
[1] 0.1841229

# selecting diagnostics
> fit$diagnose_sampler(diagnostics = c("treedepth", "ebfmi"))
Warning: 1 of 4 chains had energy-based Bayesian fraction of missing information (E-BFMI) less than 0.2.
This may indicate poor exploration of the posterior.

$max_treedepths
[1] 0

$min_ebfmi
[1] 0.1841229

We could also do what I think Martin was suggesting and have two separate methods for this, but I'm not sure we want the user to have to call two separate methods to get the warnings and the values. But I'm happy to be convinced otherwise.

rok-cesnovar · 2021-11-03T18:09:55Z

I like it. Any reason to return max_treedepths and not just a vector of all tree depths? Same with ebfmi. I know there are users that compute both by reading sampler diagnostics as they need them to diagnose different runs. For example, when working towards getting those treedepths not to get maxed out, that info is crucial.

jgabry · 2021-11-03T18:19:41Z

Any reason to return max_treedepths and not just a vector of all tree depths?

Since tree depths are different for each chain and each iteration I was thinking they would just use fit$sampler_diagnostics() if they want all the individual values and they would use fit$diagnose_sampler() (or whatever we call it) to get the values that triggered the warnings (i.e., how many times max was hit). Opinions?

Same with ebfmi.

Yeah I can change it to include the ebfmi for each chain instead of just the min.

rok-cesnovar · 2021-11-03T18:42:39Z

Yeah, maybe max_treedepth for all chains is fine. Maybe for ebmfi split by chains is better? I don't really have a strong opinion, just bouncing ideas :)

jgabry · 2021-11-03T18:46:05Z

So maybe the returned list looks like this? It has the total number of divergences, the total number of times hitting max treedepth, and the ebfmi for each chain:

$divergences
[1] 335

$max_treedepths
[1] 0

$ebfmi
[1] 0.2566534 0.4253768 0.3219593 0.1972186

jsocolar · 2021-11-03T18:46:17Z

Just double checking: am I correct that cmdstan does not have any way to return whether the max treedepth prematurely halted the trajectory? That is, on iterations where max treedepth is reached, my understanding is that the current way of doing things will give a treedepth warning even if the NUTS criterion was reached during the final doubling of the tree. And it has to be this way because a cmdstan itself does not track whether or not we exceed treedepth (as opposed to merely reach it), and such a feature is unlikely to happen anytime soon because it would change the CSV output and potentially break a bunch of stuff.

Just want to double check that this is all accurate; otherwise I think it would be wise to distinguish reaching max treedepth and terminating via the NUTS criterion versus premature termination due to max treedepth. As things are, a user with poor mixing might decide to increase treedepth even if doing so has absolutely no effect on the exploration.

rok-cesnovar · 2021-11-03T18:48:36Z

Yeah, I think you are correct. @jsocolar does rstan have this function?

jsocolar · 2021-11-03T18:49:36Z

Not that I know of.

jgabry · 2021-11-03T18:51:08Z

Yeah I also think you're right that currently if it says it hit max treedepth it's not clear whether the trajectory was halted prematurely. Unfortunately I think we're stuck with that for now. I guess we can document this caveat in the doc for this new method. @jsocolar Suggestions for how to word it?

rok-cesnovar · 2021-11-03T18:57:42Z

So maybe the returned list looks like this?

Looks great to me.

jsocolar · 2021-11-03T18:57:55Z

@jgabry I'm a bit behind the times perhaps on what the actual recommendation is for these treedepth issues. If I'm on the default max_treedepth = 10, and I see warnings (and poor convergence diagnostics) is our suggestion to increase max_treedepth or is it to try to tame the difficult geometry?

And in the latter case, would we want to just warn about hitting max treedepth or do we also want to warn about all treedepths in excess of N, where N is some appropriate number (possibly greater than 10).

jgabry · 2021-11-03T19:13:59Z

Actually I just remembered that these pieces of advice or caveats about interpretations should go in the website discussed in #505. The idea is to put all that stuff in one place so we don't have write it out separately for each interface and so we can make updates in one place instead of having to change warning messages. So I don't think we need to include this info with this new method other than to point to that website.

But since you asked,

is our suggestion to increase max_treedepth or is it to try to tame the difficult geometry?

this is tricky because on the one hand if increasing max_treedepth by 1 or 2 is sufficient then that seems preferable to rewriting the model, but often taming the geometry is really what's necessary. So it's hard to say because we don't know how far beyond max_treedepth it would have gone. For models that run pretty fast I guess it can't hurt to try bumping up max_treedepth a little, but for models that take a lot longer maybe it makes sense to recommend working on taming the geometry since it would be a burden to keep checking different values of max_treedepth (and ultimately taming the geometry could help in other ways too beyond just treedepth issues).

And in the latter case, would we want to just warn about hitting max treedepth or do we also want to warn about all treedepths in excess of N, where N is some appropriate number (possibly greater than 10).

That's a great question. I have not thought about this enough to have a good answer.

Anyway, these are all good things for us to think more about. If you're up for it I would raise these questions when we ask for comments on the website mentioned in #505. It sounds like we're close to doing that.

jsocolar · 2021-11-03T19:23:36Z

Last thing:
I think counts of max treedepths split by chain is quite useful for diagnosing bad warmup. Like if one chain has all the max treedepths, then the first thing to do should be to lengthen warmup and/or the term buffer, rather than to increase max treedepth or reparameterize. My personal preference would be to output a vector of the number of times max_treedepth is reached per chain, especially if we're doing that for efmi anyway.

jgabry · 2021-11-03T19:30:08Z

That seems like a good idea. I think the warning message should probably report the total (summing over chains) but we can return a separate number for each chain. @rok-cesnovar Does that sound ok to you?

rok-cesnovar · 2021-11-03T19:45:31Z

Yes, that sounds great!

jgabry · 2021-11-03T22:19:41Z

I guess if we're going to do it by chain for treedepth and ebfmi we should also do it for divergences too for consistency, right? The warning messages can stay the same though.

jgabry · 2021-11-03T23:16:22Z

Ok I've updated so that the output now looks like this (in this case only divergences warranted a warning):

> fit$diagnose_sampler()

Warning: 80 of 4000 (2.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

$divergences
[1]  2 21 22 35

$max_treedepths
[1] 0 0 0 0

$ebfmi
[1] 0.2023454 0.3034505 0.2355953 0.3790196

jgabry · 2021-11-03T23:17:36Z

@rok-cesnovar @jsocolar and anyone else who's interested, a few questions:

What should the names of the returned list be? Currently the names are divergences, max_treedepths, and ebfmi. For consistency it should be ebfmis (plural), but that seems weird for some reason. Alternatively all could be singular but then divergence instead of divergences kind of feels weird. Preferences?
What should this method be called? Right now it's tentatively called diagnose_sampler(), but we also already have a sampler_diagnostics() method that returns the diagnostics for each iteration as a draws object from posterior. It feels weird to have both of those names. So maybe this new method should be called diagnostic_summary() to indicate that it's summarizing the info from sampler_diagnostics()?

jsocolar · 2021-11-04T02:36:27Z

I agree that ebfmis is awkward. It's also a tad awkward to use max_treedepth(s), which could just as well mean the maximum treedepth reached by each chain (suppose one sees $max_treedepths of [1] 8 7 8 9; one might well think that these are the maximum depths reached by each chain rather than the number of times each chain reached treedepth 10).

The following suggestions are awkward in their own way, but here they are for consideration:
num_divergent, num_max_treedepth, ebfmi.

martinmodrak · 2021-11-04T09:23:48Z

In my code doing similar things I use n_divergent or num_divergent, so I'd be quite happy about that. I'll note that posterior uses a different convention where there is no underscore after n (ndraws, nvariables), so it would be ndivergent, nmax_treedepth if you want to be consistent with posterior

jgabry · 2021-11-04T15:03:33Z

Thanks for the suggestions. I think using the n_ or num_ prefix is a good idea.

jgabry · 2021-11-04T17:58:22Z

Ok, I've updated it to use num_divergent, num_max_treedepth, ebfmi. I'm going to open a draft PR and ask for some more feedback there.

rok-cesnovar added this to the beta-release milestone Jun 16, 2020

jgabry added the feature New feature or request label Jun 16, 2020

jgabry modified the milestones: beta-release, v1.0.0 - release Jun 30, 2020

rok-cesnovar mentioned this issue Feb 24, 2021

cmdstan_diagnose: Return diagnostics as numerics #459

Closed

rok-cesnovar mentioned this issue May 15, 2021

Add check_bfmi function #500

Merged

2 tasks

jgabry closed this as completed May 15, 2021

jgabry reopened this May 15, 2021

rok-cesnovar self-assigned this Aug 30, 2021

jgabry mentioned this issue Nov 4, 2021

New method summarizing sampler diagnostics and warnings #585

Merged

2 tasks

jgabry closed this as completed in #585 Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose diagnostic check functions #205

Expose diagnostic check functions #205

rok-cesnovar commented Jun 16, 2020

jgabry commented Jun 16, 2020

rok-cesnovar commented Jun 19, 2020 •

edited

Loading

jgabry commented Jun 19, 2020

rok-cesnovar commented Jun 30, 2020

jgabry commented Jun 30, 2020

rok-cesnovar commented Jun 30, 2020

jgabry commented Jun 30, 2020

rok-cesnovar commented May 15, 2021

jgabry commented May 15, 2021 •

edited

Loading

martinmodrak commented Aug 2, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jgabry commented Nov 3, 2021

jsocolar commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jsocolar commented Nov 3, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jsocolar commented Nov 3, 2021

jgabry commented Nov 3, 2021

jsocolar commented Nov 3, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jgabry commented Nov 3, 2021

jgabry commented Nov 3, 2021

jgabry commented Nov 3, 2021

jsocolar commented Nov 4, 2021

martinmodrak commented Nov 4, 2021

jgabry commented Nov 4, 2021

jgabry commented Nov 4, 2021

Expose diagnostic check functions #205

Expose diagnostic check functions #205

Comments

rok-cesnovar commented Jun 16, 2020

jgabry commented Jun 16, 2020

rok-cesnovar commented Jun 19, 2020 • edited Loading

jgabry commented Jun 19, 2020

rok-cesnovar commented Jun 30, 2020

jgabry commented Jun 30, 2020

rok-cesnovar commented Jun 30, 2020

jgabry commented Jun 30, 2020

rok-cesnovar commented May 15, 2021

jgabry commented May 15, 2021 • edited Loading

martinmodrak commented Aug 2, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jgabry commented Nov 3, 2021

jsocolar commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jsocolar commented Nov 3, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jsocolar commented Nov 3, 2021

jgabry commented Nov 3, 2021

jsocolar commented Nov 3, 2021

jgabry commented Nov 3, 2021

rok-cesnovar commented Nov 3, 2021

jgabry commented Nov 3, 2021

jgabry commented Nov 3, 2021

jgabry commented Nov 3, 2021

jsocolar commented Nov 4, 2021

martinmodrak commented Nov 4, 2021

jgabry commented Nov 4, 2021

jgabry commented Nov 4, 2021

rok-cesnovar commented Jun 19, 2020 •

edited

Loading

jgabry commented May 15, 2021 •

edited

Loading