-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: summarise specific variables with specific functions in summarise_draws #105
Comments
I agree, this is something we need. From how I see it now, we have two options to make that possible:
|
I get the need for some way to calculate multivariate measures, but it occurs to me the selection part of this question is maybe overloading summarise_draws to do more than it needs to --- I'm always suspicious of functions that start accumulating a lot of functionality and options. Usually it's a sign the API needs to be decomposed in some way. Perhaps variable selection is better handled by doing subset_draws prior to summarise_draws? Unless I'm mistaken, in most cases where different variables are summarised differently the current format of output tables in summarise_draws won't conform so you wouldn't easily be able to automatically output them in a single table anyway. |
I agree with @mjskay that we should add another function (option 2) to not mess up the cleanness of the API. I am a little unsure about the rest of your comments. The point is not necessarily to perform selection on the fly and I agree this is better handled by adding a What I envision the new interface to handle are multivariate input (draw vectors for multiple variables) and scalar output, resempling the interface of |
The point I was trying to make is that there's some spectrum between a maximalist API that tries to incorporate every feature into big functions and one that is structured in such a way that users can come up with combinations of simple operations that accomplish the tasks they are trying to accomplish, as well as tasks you haven't thought of in designing the API.
Makes sense. My gut reaction was to think that this kind of summarization is pretty easy to do with existing APIs, hence my reticence at re-inventing it. E.g. an as_draws_df() followed by a summarise() would do it, and if you did a subset_draws() first to the relevant variables it would likely not be slow (and for folks who don't like dplyr you could use aggregate() from base instead). On the other hand I can see an argument for a consistent API across draws data types for doing this kind of thing (perhaps if the operations are very common?). I suppose I don't have a strong opinion either way, but wanted to raise a point of discussion about how "big" the posterior API is intended to be. |
Yes, you are right, one could move to dplyr for this purpose. Hmm, I will also have to think of this more whether adding such funcationality for all formats would be worthwile and how it should look. |
This is a throwaway comment since it seems like you're all on the same page anyway, but just to mention that in addition to the other concerns, attempting to shoehorn this inside |
Currently
summarise_draws
applies the summary functions to all variables. In some cases it could be useful to summarise specific variables with specific functions, akin todplyr:summarise
. This would also allow for summary functions to take multiple variables as functions.A simple example usecase might be calculating the KL-divergence between the distributions of two parameters in the draws object, e.g.:
The text was updated successfully, but these errors were encountered: