-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control Variates #273
Comments
I would suggest the following procedure: You post example code for one control variates function here, say for |
This may be true, yes, at least partially. I will keep this in mind.
Matthew Kay ***@***.***> schrieb am Mi., 22. Feb. 2023, 17:23:
… Is this another example of the general problem of how to incorporate
information from special variables (like weights - see #184
<#184>) into summary
calculations? Maybe the solution to this should also solve #184
<#184> and #105
<#105>.
—
Reply to this email directly, view it on GitHub
<#273 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AB6RWLCJBIU6WTSJZDWYY4RDANCNFSM6AAAAAAVEAFLPE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I've done up a couple of draft The function for general input and an example is below:
The function for Stan input and an example is below:
Lots of extensions to this are possible, e.g.
I'm happy to add any of this functionality as we go. |
I have now added the basic functionality for the control variates feature in the x <- as_draws_df(example_draws())
x$.gradient_mu <- rnorm(ndraws(x))
x$.gradient_sigma <- rnorm(ndraws(x))
print(x)
reserved_variables(x)
grads <- gradients(x)
str(grads)
summarise_draws(x, mean_control_variates, .args = list(gradients = grads)) The main internal change is that The The branch lacks doc and testing so far. It is purely to provide the basis for further developing this feature. What I have not done yet is an automatic detection of the need of gradients and extraction thereof in |
We (@simonrmaskell, @LeahPrice, @YifanZhou, @jswright-dstl) were talking about what we need to do in response to your kind work on integrating control variates into posterior. We have identified some functionality that we think we should add and are relatively clear on how to go about doing that. Where we are less certain is around the documentation and testing that is needed. Is there an example of something similar with documentation we can use as a calibration point? Similarly, we can come up with some tests that don’t use any specific ppl, but those tests are likely to be quite benign. We’d like to have some tests/examples of how we can use the code with more complex models and can’t see how to do that without using Stan or similar. Do you see such examples as distinct from the tests? If so, my guess is that you see that as a separate repo for the examples that uses posterior. Is that right? In that case, are you happy that the documentation for posterior points to that other repo (since we anticipate people will struggle to use the “right” gradients and really want to make it easy for them to do so)? |
Perhaps the documentation and tests that we use for convergence diagnostics (see https://github.com/stan-dev/posterior/blob/master/R/convergence.R for the doc and https://github.com/stan-dev/posterior/blob/master/tests/testthat/test-convergence.R) could serve as a helpful starting point. It is okay if the tests that live in posterior are relatively simple. The more complicated tests should probably best live in another repo, as you say. I am happy with the posterior doc pointing to this other repo (or case studies, blog posts or whatever). We could also think about a vignette in posterior itself for this purpose, but this decision is likely better made at a later point, when we have both the implementaitons in posterior and your outside repo in a more concrete shape. |
Thanks @paul-buerkner and sorry I've been slow with this. I've done a very basic implementation of sd_control_variates in my forked version of this branch. I also started adding some documentation to the control variate functions but it's very poor at this stage and I haven't started documenting the gradients function. I'd like to improve the control variates but the improvements I have in mind would require access to the following things in the control variate functions:
I suspect these changes need to be done through additions/edits/special cases in |
No worries. There is no rush on our side. I don't fully understand the points (1) to (3) though. Can you please explain them a bit more? |
Sorry I think I got ahead of myself and I should have implemented some examples first. Is there a way to get gradients automatically? In the example they're calculated separately but there's quite a bit of room for error in this process because people could get gradients for the wrong parameterisation which invalidates the control variates. We're after gradients on the unconstrained space (e.g. The questions I asked earlier assumed we had the gradients mentioned above and that the samples
Sorry for all of the questions. |
This aspect should not be part of posterior, as it needs to be independent of Stan or any other implementation. It can merely take and use what the user says are gradient. It cannot validate if those are the right ones.
Not sure how realistic this is to implement reliably. Why would we need that?
Agreed. There should be multiple methods with the same function name (generic). One that takes a vector to work with summarize_draws on a parameter by parameter basis. And one that takes full draws objects to vectorize computation. We should offer both.
Not sure I understand. In what place do we need to have access to that (in terms of control variates)? I assume you don't just mean taking the logarithm of the samples within posterior, which would of course be trivial to do. |
Would it be possible to have two options: providing gradients or provide a Stan object? I'd be much more comfortable if we had a Stan-specific option available so it's more foolproof.
The number of control variates depends on the dimension Therefore knowing which dimension you're looking at lets you use more advanced control variates without requiring you to estimate way more coefficients. This is important because we often need higher order control variates, or at least the second order control variates, to get substantial variance reductions. E.g. if you have a Gaussian target then first order control variates are exact for the mean, second order control variates are exact for the variance, and so on for kth order control variates and the kth moment.
Thanks, that's good to hear you agree about offering a vectorised option.
We need this information for higher order control variates. For example, second order control variates have terms that are of the form 1+ |
I of course understand your concern. As discussed in our joint meeting, that (stan dependent) method would need to be implemented in a separte package, not in posterior. This separate pacakge would of course interface with all the stuff we now implement in posterior and additionally could provide more validation etc. depending on the used backend (e.g., Stan). About the specific selection of control variates, is that something to leave up to the user? Or implement in your higher-level package that also does the validation? Your questions seem quite advanced right now and don't raally touch the basic implementation of control variates in posterior at the moment. Do you think it could make sense to come back to these questions at a later point, once the basic implementation (of stuff we have to do anyway) is done? |
Thinking about the generic implementation in posterior (and more elaborate features can then be elsewhere), I have a question: If
then if we subset x, say
does that subset include gradients for mu and sigma? If it does, it seems that would (at least partially) solve the dimensionality issue at this point? I think the control variates would be mostly used for a small number of quantities of interest and not automatically used for every possible parameter in the model. |
I am not sure I understand your question. Can you elaborate? |
What happens if you run the following code?
Does |
So far, all gradients are retained upon subsetting. But we could perhaps add an option to automatically subset gradients with the same name as the selected variables. What do you think? |
I think "subset gradients with the same name as the selected variables" would provide sufficient functionality related to what @LeahPrice was asking above |
Thanks @avehtari , yes that would cover one of the main things I was asking about and it'd definitely be helpful. It'd be needed to get reasonable performance when the number of MCMC iterations is less than the number of dimensions. The other two main things that would help a lot are access to the samples on the unconstrained space and access to the gradients of the log target on the unconstrained space. It seems like there's a lot of room for error if users need to manually get the latter themselves. |
These things will be added to CmdStanR, RStan, etc. interfaces after the posterior package provides the framework-agnostic functions. |
I think the "problem" is that if neither Posterior or the Stan software components take ownership of the specifics of the interface, users will make mistakes and won't use the functionality. I think our proposed solution is therefore to progress with integrating control variates into Stan, rather than into Posterior. That seems suboptimal. |
I think we have discussed this, haven't we? Posterior shall not be dependent on Stan or any other specific PPL and we currently don't intent to change this principle for control variates. Therefore, I think you thought about a separate package which builds on the posterior functionality but ensures the correct extraction from Stan or other PPLs(?) At least I remember something along those lines, I think. |
Yes. We did talk about the idea of using a separate package. However, a separate package that uses control variates already exists (thanks to efforts by @LeahPrice). Given that, it's not clear to us what the merits would be of having a second separate package and interfacing that to Posterior. That's what motivates our revised plan to migrate to integrating control variates into Stan itself. |
I see, sorry I forgot about your existing package in my line of argument above. So this existing package could theoretically be refactored to use the posterior implementation (once implemented) and validate the Stan input, right? Kind of as it does right now but with native posterior support. Back when we discussed this, I understood this was your plan, I think. Or are you suggesting that given your existing package's functionality, a separate feature of this in posterior (this issue here) would not make sense unless we implemented input Stan validation directly in posterior? |
I think our view now (which I recognise may have evolved and may sensibly continue to do so) is that given the separate package can already calculate the control variates, assuming we ensure it can interface that package correctly to Stan etc, it's not clear to us what benefit we get (or a user of either package gets) from us moving some of the functionality that is already in that package into Posterior. Is it clear to you what those benefits would be? |
Your package would be more integrated with the Stan universe of package via posterior, which will probably make it easier to use by a larger group of people and thus ultimately applied more. Whether this is enough of a motativation for you, I don't know of course. If you like, I am also happy to have another call with you to make sure we are all on the same page. |
That's why I said that specifics will be implemented in the interface the users are using. I feel like there is still some confusion on the modular structure of the Stan software.
It's not clear what you mean by "Stan itself". Currently, all posterior summaries are computed by various interfaces like CmdStan (in C++), RStan (it's own R code but switching to posterior package), PyStan (own Python code or ArviZ), CmdStanR (using posterior package). CmdStanR uses posterior package to handle draws and summaries. If you want CmdStanR user to use control variates, it would be natural to support the existing way to get summaries, that is using the posterior package. Thus, I recommend adding 1) some code to CmdStanR to handle gradients and constrain/unconstrain part of the control variates, 2) some code to posterior package to allow storing the control variate information along the draws, 3) some code to the posterior package or to a separate package to do the control variate computation given the draws object that includes the control variate information. Having no control variate related support in the posterior package would mean more work in the separate package to support different posterior draw formats (data frame, array, matrix, rvars). I agree with Paul that
|
Sorry to have been imprecise about "Stan itself", I meant that we currently plan to expose the control variates functionality via CmdStan with a view to CmdStanR and CmdStanPy then being able to use the functionality. If we did that, we'd need to modify some of the Stan backend to make that work. There is a design document being prepared and comments will be welcome in due course. However, I think I now see what you mean; I hadn't fully appreciated that CmdStanR uses posterior and could therefore ensure that the interfacing was correct. That would remove the issue of needing this additional package, which has been motivating us to consider the alternative approach. I'll follow up with the CmdStanR folks to understand how that could work from that end. Thank you. |
Looks like cmdstanr now has the functionality we need.... |
Yes, CmdStanR has now functionality for getting gradients and handling unconstrain/constrain mapping. When I wrote above that more code would be added to CmdStanR, I meant that additional code adding the desired control variate information to the posterior object, but that requires first some support in the posterior package, which is the topic of this issue. Would you like to have a call to clarify the role of different packages and available functionality? |
Almost certainly... Will follow up via email. |
@alecksphillips: as discussed, be good if you could pick this up |
It would be really great if posterior had a capacity to have reduced the standard errors associated with the expectations it can produce (eg mean, var and sd). Recent work [1] has shown that, if we use the score function as a control variate, we can reduce such standard errors and do so in a way that is effective in the context of constrained variables (as often considered in applications of interest in the context of, for example, estimating volatility or a frequency).
Based on initial discussion, we think the approach to adopt is to augment posterior with the capacity to accept, alongside the samples, the gradient of the log posterior of mapped parameters with respect to those mapped parameters (these mapped parameters are unconstrained but are nonlinear functions of the constrained variables). The implementation then decomposes into three components:
* Storage: we will need to use protected variables (called something like .grad_log_p_unconstrained) within the posterior object itself to pass the samples and gradients around the code such that anything we contribute is compatible with existing functionality.
* Interface: We would sensibly subtly augment the functionality of summarise_draws such that it has additional measures such that the user can use (eg summarise_draws(x, "mean_control_variates",”sd_control_variates”), where "mean_control_variates" calculates the mean using control variates and “sd_control_variates” does the same for the standard deviation).
* Estimation: writing the functions to calculate the estimates themselves in such a way that, for example, mean__control_variates fails gracefully (eg by calling mean, without control variates) when .grad_log_p_unconstrained is not populated. These functions are likely to be based heavily on a combination of the code written (in Python here: https://github.com/zhouyifan233/ControlVariates) for [1] and a pre-existing R repo (here: https://github.com/LeahPrice/ZVCV).
We would welcome advice on how to progress towards a pull-request associated with this issue.
[1] S Maskell, Y Zhou and A Mira. Control Variates for Constrained Parameters. IEEE Signal Processing Letters. Vol 29, pp. 2333-2337. 2022
The text was updated successfully, but these errors were encountered: