Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose diagnostic check functions #205

Closed
rok-cesnovar opened this issue Jun 16, 2020 · 32 comments · Fixed by #585
Closed

Expose diagnostic check functions #205

rok-cesnovar opened this issue Jun 16, 2020 · 32 comments · Fixed by #585
Assignees
Labels
feature New feature or request

Comments

@rok-cesnovar
Copy link
Member

check_sampler_transitions_treedepth and check_divergences should be accessible to the user in some way.

Either separately or as part of fit$diagnose() or something with possibly other checks.

Cmdstan's diagnostics checks E-BMFI which is simple (only requires energy__ I think) and two that require reading all data (ESS and RHAT checks).

@rok-cesnovar rok-cesnovar added this to the beta-release milestone Jun 16, 2020
@jgabry jgabry added the feature New feature or request label Jun 16, 2020
@jgabry
Copy link
Member

jgabry commented Jun 16, 2020

Yeah I agree we should expose these. I think we should also change the name of check_sampler_transitions_treedepth to just check_treedepth.

Maybe a method fit$diagnose(diagnostic = NULL), where if diagnostic=NULL it checks all of them but the user can also specify which ones, e.g. fit$diagnose(c("divergence", "ebfmi"))?

Also I think it would be useful if these methods returned the value of diagnostic to the user instead of only printing messages (it can also print messages).

@rok-cesnovar
Copy link
Member Author

rok-cesnovar commented Jun 19, 2020

If stan-dev/posterior will have these checks (see stan-dev/posterior#77), we should just use those.

The divergences, treedepth and ebfmi use sampler diagnostics so those are Stan specific and will be done by our helper functions.

cc @paul-buerkner

@jgabry
Copy link
Member

jgabry commented Jun 19, 2020

Yeah I agree

@rok-cesnovar
Copy link
Member Author

What should we do here for now:

  • add fit$diagnose() that runs treedepth, divergence and ebmfi without rhat/ess
  • add fit$diagnose() that runs treedepth, divergence and ebmfi and also implement warnings for rhat/ess (at least the latter would get replaced by posterior)

This one should be simple.

@jgabry
Copy link
Member

jgabry commented Jun 30, 2020

I think posterior is going to end up including warnings also for HMC/NUTS, so I think all of them could end up being replace by posterior. Do you think we should just wait for posterior or implement them and then replace them with posterior's versions?

@rok-cesnovar
Copy link
Member Author

Lets just wait for posterior then and move the milestone if you agree.

@jgabry
Copy link
Member

jgabry commented Jun 30, 2020

Sounds good

@rok-cesnovar
Copy link
Member Author

So what I think the API for this should be:

  • add a diagnostics = c("divergences", "treedepth", "bfmi") argument to $sample()
  • deprecate validate_csv. I think the name was my idea. In hindsight its a bad name, not informative at all. validate_csv = FALSE is the same as diagnostics = NULL or ""
  • add fit$diagnose(diagnostics = c("divergences", "treedepth", "bfmi"))

I think we should just go with it now that we have the three sampler diagnostics checks. The rhat/eff checks can be added later.

@jgabry
Copy link
Member

jgabry commented May 15, 2021

I like this idea. We should think carefully about what object fit$diagnose() should return (aside from what it prints/warns about). When posterior eventually implements these hmc diagnostics we should use them internally in fit$diagnose() but ideally not change what it returns so we don't have to deprecate it in favor of a new method that uses the posterior stuff.

@jgabry jgabry closed this as completed May 15, 2021
@jgabry jgabry reopened this May 15, 2021
@martinmodrak
Copy link
Contributor

Just stumbled on this (calling CmdStan in a wrapper, so user never runs summary(), but don't want to swallow diagnostic warnings). May I suggest that we add a new method "$check_diagnostics()" that returns nothing, but can show warnings as a side-effect and not touch the $diagnose() (which IMHO is useful on its own)? That would feel cleanest to me (but I don't insist on it, just a different approach).

@rok-cesnovar rok-cesnovar self-assigned this Aug 30, 2021
@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

@rok-cesnovar and anyone else interested, I have some time to work on this, but I wanted to get a bit of feedback first.

I have an idea for a method diagnose_sampler(), that both prints the warning messages about the diagnostics (if argument quiet=FALSE) and returns their values as a list. Right now it would include divergences, max_treedepths, and ebfmi (I'm planning to branch off of the branch from @jsocolar's ebfmi PR so that we have that diagnostic available).

For example, what do you think about these examples?

# here diagnose_sampler() prints the messages and returns a list 
# containing the values of the diagnostics 
# it could also have a `quiet` argument to turn off printing the messages. 
> fit$diagnose_sampler()

Warning: 283 of 4000 (7.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

Warning: 1 of 4 chains had energy-based Bayesian fraction of missing information (E-BFMI) less than 0.2.
This may indicate poor exploration of the posterior.

$divergences
[1] 283

$max_treedepths
[1] 0

$min_ebfmi
[1] 0.1841229
# using quiet=TRUE
> fit$diagnose_sampler(quiet = TRUE)
$divergences
[1] 283

$max_treedepths
[1] 0

$min_ebfmi
[1] 0.1841229
# selecting diagnostics
> fit$diagnose_sampler(diagnostics = c("treedepth", "ebfmi"))
Warning: 1 of 4 chains had energy-based Bayesian fraction of missing information (E-BFMI) less than 0.2.
This may indicate poor exploration of the posterior.

$max_treedepths
[1] 0

$min_ebfmi
[1] 0.1841229

We could also do what I think Martin was suggesting and have two separate methods for this, but I'm not sure we want the user to have to call two separate methods to get the warnings and the values. But I'm happy to be convinced otherwise.

@rok-cesnovar
Copy link
Member Author

I like it. Any reason to return max_treedepths and not just a vector of all tree depths? Same with ebfmi. I know there are users that compute both by reading sampler diagnostics as they need them to diagnose different runs. For example, when working towards getting those treedepths not to get maxed out, that info is crucial.

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

Any reason to return max_treedepths and not just a vector of all tree depths?

Since tree depths are different for each chain and each iteration I was thinking they would just use fit$sampler_diagnostics() if they want all the individual values and they would use fit$diagnose_sampler() (or whatever we call it) to get the values that triggered the warnings (i.e., how many times max was hit). Opinions?

Same with ebfmi.

Yeah I can change it to include the ebfmi for each chain instead of just the min.

@rok-cesnovar
Copy link
Member Author

Yeah, maybe max_treedepth for all chains is fine. Maybe for ebmfi split by chains is better? I don't really have a strong opinion, just bouncing ideas :)

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

So maybe the returned list looks like this? It has the total number of divergences, the total number of times hitting max treedepth, and the ebfmi for each chain:

$divergences
[1] 335

$max_treedepths
[1] 0

$ebfmi
[1] 0.2566534 0.4253768 0.3219593 0.1972186

@jsocolar
Copy link
Contributor

jsocolar commented Nov 3, 2021

Just double checking: am I correct that cmdstan does not have any way to return whether the max treedepth prematurely halted the trajectory? That is, on iterations where max treedepth is reached, my understanding is that the current way of doing things will give a treedepth warning even if the NUTS criterion was reached during the final doubling of the tree. And it has to be this way because a cmdstan itself does not track whether or not we exceed treedepth (as opposed to merely reach it), and such a feature is unlikely to happen anytime soon because it would change the CSV output and potentially break a bunch of stuff.

Just want to double check that this is all accurate; otherwise I think it would be wise to distinguish reaching max treedepth and terminating via the NUTS criterion versus premature termination due to max treedepth. As things are, a user with poor mixing might decide to increase treedepth even if doing so has absolutely no effect on the exploration.

@rok-cesnovar
Copy link
Member Author

Yeah, I think you are correct. @jsocolar does rstan have this function?

@jsocolar
Copy link
Contributor

jsocolar commented Nov 3, 2021

Not that I know of.

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

Yeah I also think you're right that currently if it says it hit max treedepth it's not clear whether the trajectory was halted prematurely. Unfortunately I think we're stuck with that for now. I guess we can document this caveat in the doc for this new method. @jsocolar Suggestions for how to word it?

@rok-cesnovar
Copy link
Member Author

So maybe the returned list looks like this?

Looks great to me.

@jsocolar
Copy link
Contributor

jsocolar commented Nov 3, 2021

@jgabry I'm a bit behind the times perhaps on what the actual recommendation is for these treedepth issues. If I'm on the default max_treedepth = 10, and I see warnings (and poor convergence diagnostics) is our suggestion to increase max_treedepth or is it to try to tame the difficult geometry?

And in the latter case, would we want to just warn about hitting max treedepth or do we also want to warn about all treedepths in excess of N, where N is some appropriate number (possibly greater than 10).

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

Actually I just remembered that these pieces of advice or caveats about interpretations should go in the website discussed in #505. The idea is to put all that stuff in one place so we don't have write it out separately for each interface and so we can make updates in one place instead of having to change warning messages. So I don't think we need to include this info with this new method other than to point to that website.

But since you asked,

is our suggestion to increase max_treedepth or is it to try to tame the difficult geometry?

this is tricky because on the one hand if increasing max_treedepth by 1 or 2 is sufficient then that seems preferable to rewriting the model, but often taming the geometry is really what's necessary. So it's hard to say because we don't know how far beyond max_treedepth it would have gone. For models that run pretty fast I guess it can't hurt to try bumping up max_treedepth a little, but for models that take a lot longer maybe it makes sense to recommend working on taming the geometry since it would be a burden to keep checking different values of max_treedepth (and ultimately taming the geometry could help in other ways too beyond just treedepth issues).

And in the latter case, would we want to just warn about hitting max treedepth or do we also want to warn about all treedepths in excess of N, where N is some appropriate number (possibly greater than 10).

That's a great question. I have not thought about this enough to have a good answer.

Anyway, these are all good things for us to think more about. If you're up for it I would raise these questions when we ask for comments on the website mentioned in #505. It sounds like we're close to doing that.

@jsocolar
Copy link
Contributor

jsocolar commented Nov 3, 2021

Last thing:
I think counts of max treedepths split by chain is quite useful for diagnosing bad warmup. Like if one chain has all the max treedepths, then the first thing to do should be to lengthen warmup and/or the term buffer, rather than to increase max treedepth or reparameterize. My personal preference would be to output a vector of the number of times max_treedepth is reached per chain, especially if we're doing that for efmi anyway.

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

That seems like a good idea. I think the warning message should probably report the total (summing over chains) but we can return a separate number for each chain. @rok-cesnovar Does that sound ok to you?

@rok-cesnovar
Copy link
Member Author

Yes, that sounds great!

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

I guess if we're going to do it by chain for treedepth and ebfmi we should also do it for divergences too for consistency, right? The warning messages can stay the same though.

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

Ok I've updated so that the output now looks like this (in this case only divergences warranted a warning):

> fit$diagnose_sampler()

Warning: 80 of 4000 (2.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

$divergences
[1]  2 21 22 35

$max_treedepths
[1] 0 0 0 0

$ebfmi
[1] 0.2023454 0.3034505 0.2355953 0.3790196

@jgabry
Copy link
Member

jgabry commented Nov 3, 2021

@rok-cesnovar @jsocolar and anyone else who's interested, a few questions:

  • What should the names of the returned list be? Currently the names are divergences, max_treedepths, and ebfmi. For consistency it should be ebfmis (plural), but that seems weird for some reason. Alternatively all could be singular but then divergence instead of divergences kind of feels weird. Preferences?

  • What should this method be called? Right now it's tentatively called diagnose_sampler(), but we also already have a sampler_diagnostics() method that returns the diagnostics for each iteration as a draws object from posterior. It feels weird to have both of those names. So maybe this new method should be called diagnostic_summary() to indicate that it's summarizing the info from sampler_diagnostics()?

@jsocolar
Copy link
Contributor

jsocolar commented Nov 4, 2021

I agree that ebfmis is awkward. It's also a tad awkward to use max_treedepth(s), which could just as well mean the maximum treedepth reached by each chain (suppose one sees $max_treedepths of [1] 8 7 8 9; one might well think that these are the maximum depths reached by each chain rather than the number of times each chain reached treedepth 10).

The following suggestions are awkward in their own way, but here they are for consideration:
num_divergent, num_max_treedepth, ebfmi.

@martinmodrak
Copy link
Contributor

In my code doing similar things I use n_divergent or num_divergent, so I'd be quite happy about that. I'll note that posterior uses a different convention where there is no underscore after n (ndraws, nvariables), so it would be ndivergent, nmax_treedepth if you want to be consistent with posterior

@jgabry
Copy link
Member

jgabry commented Nov 4, 2021

Thanks for the suggestions. I think using the n_ or num_ prefix is a good idea.

@jgabry
Copy link
Member

jgabry commented Nov 4, 2021

Ok, I've updated it to use num_divergent, num_max_treedepth, ebfmi. I'm going to open a draft PR and ask for some more feedback there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants