Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend exact posteriors to condition on multiple observations #125

Open
larryshamalama opened this issue Jul 2, 2023 · 7 comments
Open
Labels
bug Something isn't working enhancement New feature or request exact posterior important question Further information is requested

Comments

@larryshamalama
Copy link

Description of your problem or feature request

It is my understanding that current exact posteriors, e.g. gamma_poisson_conjugateo, can only condition on a single observation $Y \sim \text{Poisson}$ rather than a set of i.i.d. observations $Y_1, \dots, Y_n$. How can current conjugate relations be extended for multiple observations?

Can we allow arguments such as realized (akin to joint_logprob in AePPL) or realized_rvs_to_values in the AeMCMC's nuts' construct_sampler in AeMCMC's general construct_sampler?

srng = at.random.RandomStream(0)

lam_rv = srng.gamma(1., 1., name="lam")
Y_rv = srng.poisson(lam=lam_rv, size=3, name="Y") # something like this?

y_vv = Y_rv.clone()

sampler, initial_values = aemcmc.construct_sampler({Y_rv: y_vv}, srng)

p_posterior_step = sampler.sample_steps[lam_rv]

sample_fn = aesara.function([y_vv], p_posterior_step)

Currently, sample_fn(np.array([2, 3, 4]) yields a 3-dimensional array.

cc @brandonwillard @rlouf

@brandonwillard brandonwillard added bug Something isn't working enhancement New feature or request important exact posterior question Further information is requested and removed bug Something isn't working enhancement New feature or request labels Jul 3, 2023
@brandonwillard
Copy link
Member

brandonwillard commented Jul 4, 2023

Does the following illustrate the problem you're talking about—or part of it?

import aemcmc
import aesara
import aesara.tensor as at


srng = at.random.RandomStream(0)

lam_rv = srng.gamma(1., 1., name="lam")
Y1_rv = srng.poisson(lam=lam_rv, name="Y1")
Y2_rv = srng.poisson(lam=lam_rv, name="Y2")

y1_vv = Y1_rv.clone()
y1_vv.name = "y1"
y2_vv = Y2_rv.clone()
y2_vv.name = "y2"

sampler, initial_values = aemcmc.construct_sampler({Y1_rv: y1_vv, Y2_rv: y2_vv}, srng)

p_posterior_step = sampler.sample_steps[lam_rv]

# Only one posterior sampling step for `lam_rv`
sampler.sample_steps.keys()
# dict_keys([lam])

# and it only uses the value of `Y2_rv` (i.e. `y2`)
aesara.dprint(p_posterior_step)
# gamma_rv{0, (0, 0), floatX, False}.1 [id A]
#  |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FDEAC9F9660>) [id B]
#  |TensorConstant{[]} [id C]
#  |TensorConstant{11} [id D]
#  |Elemwise{add,no_inplace} [id E]
#  | |TensorConstant{1.0} [id F]
#  | |y2 [id G]
#  |Elemwise{true_divide,no_inplace} [id H]
#    |TensorConstant{1.0} [id I]
#    |Elemwise{add,no_inplace} [id J]
#      |TensorConstant{1.0} [id F]
#      |TensorConstant{1} [id K]

This example seems like it might require a new relation (e.g. the sum of independent Poissons being a Poisson) in order to give the expected result. At the very least, it should be aware of the other observed Poisson.

@brandonwillard brandonwillard added bug Something isn't working enhancement New feature or request labels Jul 4, 2023
@larryshamalama
Copy link
Author

Yes, definitely part of it. I was wondering if AeMCMC has the structure to emulate/automate common conjugate distributions as in this Wiki table. These likelihoods assume some $n$ observed data points. What do you think of this?

I imagine that, eventually, instead of having Y1_rv, Y2_rv, we could have some batched n-vector Y_rv in the pseudo-code above?

@larryshamalama
Copy link
Author

This example seems like it might require a new relation (e.g. the sum of independent Poissons being a Poisson) in order to give the expected result. At the very least, it should be aware of the other observed Poisson.

Why would we need to recognize that a sum of independent Poissons is a Poisson? The likelihood portion of the posterior kernel is the same as a sum of Poissons, but maybe that's just a coincidence here? Here are some math scribbles:

$$\pi_n(\lambda) \stackrel{\lambda}{\propto} \pi_0(\lambda) P(Y_1 = y_1 \mid \lambda)P(Y_2 = y_2 \mid \lambda) = \pi_0(\lambda) \frac{\lambda^{y_1}\text{exp}(-\lambda) }{y_1!} \frac{\lambda^{y_2}\text{exp}(-\lambda) }{y_2!} \stackrel{\lambda}{\propto} \pi_0(\lambda) \frac{\lambda^{y_1 + y_2}\text{exp}(-2\lambda) }{y_1! y_2!} \stackrel{\lambda}{\propto} \pi_0(\lambda) \lambda^{y_1 + y_2} \text{exp}(-2\lambda)$$

where the kernel of $Y_1 + Y_2$ would have been $\frac{\lambda^{y_1 + y_2}\text{exp}(-2\lambda) }{(y_1 +y_2)!} \stackrel{\lambda}{\propto} \lambda^{y_1 + y_2}\text{exp}(-2\lambda)$ as well. (The denominators are different) $\pi_n(\cdot)$ and $\pi_0(\cdot)$ denote the posterior and prior distributions respectively

@brandonwillard
Copy link
Member

brandonwillard commented Jul 11, 2023

Yes, definitely part of it. I was wondering if AeMCMC has the structure to emulate/automate common conjugate distributions as in this Wiki table.

Absolutely; that's a big part of this work (e.g. #48, #46). The current conjugation implementations are mostly framed as kanren relations, but there are some challenges with that approach that we need to address (e.g. #108); however, we really don't need to use that approach. We can implement these conjugates (in one "direction") using plain Aesara rewrites.

@brandonwillard
Copy link
Member

brandonwillard commented Jul 11, 2023

Why would we need to recognize that a sum of independent Poissons is a Poisson?

That's just one fairly straightforward approach to handling this exact situation, but there are likely many others. The reason that approach is nice: we need only add one simple additional rewrite and it should immediately work with the current rewrites to produce the desired result. In other words, we wouldn't need to change how things work.

The likelihood portion of the posterior kernel is the same as a sum of Poissons, but maybe that's just a coincidence here? Here are some math scribbles:

πn(λ)∝λπ0(λ)P(Y1=y1∣λ)P(Y2=y2∣λ)=π0(λ)λy1exp(−λ)y1!λy2exp(−λ)y2!∝λπ0(λ)λy1+y2exp(−2λ)y1!y2!∝λπ0(λ)λy1+y2exp(−2λ)

where the kernel of Y1+Y2 would have been λy1+y2exp(−2λ)(y1+y2)!∝λλy1+y2exp(−2λ) as well. (The denominators are different) πn(⋅) and π0(⋅) denote the posterior and prior distributions respectively

Yeah, it looks like you're deriving the rewrite I proposed adding, but in "log-density space".

Our rewrite process operates primarily in a "measurable function/random variable space": i.e. on random variables like $Z = X + Y$, where $X \sim \text{Pois}(\lambda_x)$ and $Y \sim \text{Pois}(\lambda_y)$ by adding/using/replacing $Z \sim \text{Pois}(\lambda_x + \lambda_y)$ in our IR graphs. After that, it becomes possible/easy to formulate $(\beta | Z = z)$ for any $\beta$ upon which $X$ and/or $Y$ is dependent.

Using this approach, we avoid all the extra machinery needed to perform the log-density derivation/proof of $Z = X + Y \sim \text{Pois}(...)$. That kind of work can be done "off-line" and added to the library at will.

@larryshamalama
Copy link
Author

larryshamalama commented Jul 12, 2023

Thanks for the clarifications above

Our rewrite process operates primarily in a "measurable function/random variable space": i.e. on random variables like Z=X+Y, where X∼Pois(λx) and Y∼Pois(λy) by adding/using/replacing Z∼Pois(λx+λy) in our IR graphs. After that, it becomes possible/easy to formulate (β|Z=z) for any β upon which X and/or Y is dependent.

I think that I'm missing something here. I was referring to $(\beta \mid X = x, Y = y)$, which happens to similar (proportional) to $(\beta \mid Z = z)$ but I'm not sure if it's generally the case. Mathematically, is there an implicit use of sufficient statistics that link these posterior expressions together? Perhaps that's the bridge that I am not seeing

@brandonwillard
Copy link
Member

brandonwillard commented Aug 3, 2023

You're right, that one identity won't work in general. What I'm talking about is the general approach of operating in a non-(log-)density space, for which that convolution identity is just a single example

(Sorry, been on the last part of a big project for a little while now; will provide a real response as soon as I can!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request exact posterior important question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants