Feature request: summarise specific variables with specific functions in summarise_draws #105

n-kall · 2020-12-03T14:23:11Z

Currently summarise_draws applies the summary functions to all variables. In some cases it could be useful to summarise specific variables with specific functions, akin to dplyr:summarise. This would also allow for summary functions to take multiple variables as functions.

A simple example usecase might be calculating the KL-divergence between the distributions of two parameters in the draws object, e.g.:

# draws has variables 'x' and 'y'

# currently doesn't work because it tries to apply the summary function to all variables
summarise_draws(draws, kl_xy = kl_div(x, y))

The text was updated successfully, but these errors were encountered:

paul-buerkner · 2020-12-03T16:27:39Z

I agree, this is something we need. From how I see it now, we have two options to make that possible:

Add an option to summarise_draws to change modes between the "all variables and univariate measures" (current= and "selected variables applied selectively to multivariate measures" mode (new).
add a new function with a different name for the new mode. Of course, we should also discuss which of the two modes should eventually have the name summarise_draws.

mjskay · 2020-12-04T04:10:58Z

I get the need for some way to calculate multivariate measures, but it occurs to me the selection part of this question is maybe overloading summarise_draws to do more than it needs to --- I'm always suspicious of functions that start accumulating a lot of functionality and options. Usually it's a sign the API needs to be decomposed in some way. Perhaps variable selection is better handled by doing subset_draws prior to summarise_draws? Unless I'm mistaken, in most cases where different variables are summarised differently the current format of output tables in summarise_draws won't conform so you wouldn't easily be able to automatically output them in a single table anyway.

paul-buerkner · 2020-12-04T12:15:40Z

I agree with @mjskay that we should add another function (option 2) to not mess up the cleanness of the API.

I am a little unsure about the rest of your comments. The point is not necessarily to perform selection on the fly and I agree this is better handled by adding a subset_draws to the pipe.

What I envision the new interface to handle are multivariate input (draw vectors for multiple variables) and scalar output, resempling the interface of dplyr::summarise. Common examples could be correlating variables, but also more special applications such as the one @n-kall mentioned above. Of course people could also use it for simple things such as mean(x) if they are used to the dplyr::summarise interface. Not sure if these applications justify being added to posterior in the first place, but my current understanding is that there are some relevant applications.

mjskay · 2020-12-05T06:24:44Z

I am a little unsure about the rest of your comments. The point is not necessarily to perform selection on the fly and I agree this is better handled by adding a subset_draws to the pipe.

The point I was trying to make is that there's some spectrum between a maximalist API that tries to incorporate every feature into big functions and one that is structured in such a way that users can come up with combinations of simple operations that accomplish the tasks they are trying to accomplish, as well as tasks you haven't thought of in designing the API.

What I envision the new interface to handle are multivariate input (draw vectors for multiple variables) and scalar output, resempling the interface of dplyr::summarise. Common examples could be correlating variables, but also more special applications such as the one @n-kall mentioned above. Of course people could also use it for simple things such as mean(x) if they are used to the dplyr::summarise interface. Not sure if these applications justify being added to posterior in the first place, but my current understanding is that there are some relevant applications.

Makes sense. My gut reaction was to think that this kind of summarization is pretty easy to do with existing APIs, hence my reticence at re-inventing it. E.g. an as_draws_df() followed by a summarise() would do it, and if you did a subset_draws() first to the relevant variables it would likely not be slow (and for folks who don't like dplyr you could use aggregate() from base instead).

On the other hand I can see an argument for a consistent API across draws data types for doing this kind of thing (perhaps if the operations are very common?). I suppose I don't have a strong opinion either way, but wanted to raise a point of discussion about how "big" the posterior API is intended to be.

paul-buerkner · 2020-12-07T07:52:40Z

Yes, you are right, one could move to dplyr for this purpose. Hmm, I will also have to think of this more whether adding such funcationality for all formats would be worthwile and how it should look.

jsocolar · 2021-05-19T20:27:43Z

This is a throwaway comment since it seems like you're all on the same page anyway, but just to mention that in addition to the other concerns, attempting to shoehorn this inside summarise_draws would break the parallelized version if relevant variables ended up on different chunks.

paul-buerkner added this to the CRAN release milestone Dec 3, 2020

paul-buerkner added the feature New feature or request label Mar 26, 2021

paul-buerkner mentioned this issue Jun 24, 2021

CRAN release #165

Closed

jgabry removed this from the CRAN release milestone Jun 24, 2021

paul-buerkner mentioned this issue Jul 26, 2021

Improve support (or documentation) for weighted draws #184

Open

mjskay mentioned this issue Feb 22, 2023

Control Variates #273

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: summarise specific variables with specific functions in summarise_draws #105

Feature request: summarise specific variables with specific functions in summarise_draws #105

n-kall commented Dec 3, 2020

paul-buerkner commented Dec 3, 2020

mjskay commented Dec 4, 2020

paul-buerkner commented Dec 4, 2020

mjskay commented Dec 5, 2020

paul-buerkner commented Dec 7, 2020

jsocolar commented May 19, 2021

Feature request: summarise specific variables with specific functions in summarise_draws #105

Feature request: summarise specific variables with specific functions in summarise_draws #105

Comments

n-kall commented Dec 3, 2020

paul-buerkner commented Dec 3, 2020

mjskay commented Dec 4, 2020

paul-buerkner commented Dec 4, 2020

mjskay commented Dec 5, 2020

paul-buerkner commented Dec 7, 2020

jsocolar commented May 19, 2021