Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the relation between a Half-Cauchy and an Inverse-Gamma scale mix… #66

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Sep 25, 2022

Related to #49 #64

Here are a few important guidelines and requirements to check before your PR can be merged:

  • There is an informative high-level description of the changes.
  • The description and/or commit message(s) references the relevant GitHub issue(s).
  • pre-commit is installed and set up.
  • The commit messages follow these guidelines.
  • The commits correspond to relevant logical changes, and there are no commits that fix changes introduced by other commits in the same branch/BR. If your commit description starts with "Fix...", then you're probably making this mistake.
  • There are tests covering the changes introduced in the PR.

Don't worry, your PR doesn't need to be in perfect order to submit it. As development progresses and/or reviewers request changes, you can always rewrite the history of your feature/PR branches.

If your PR is an ongoing effort and you would like to involve us in the process, simply make it a draft PR.

@rlouf rlouf force-pushed the halfcauchy-scale-mixture branch 4 times, most recently from c711b2b to aeeda45 Compare September 26, 2022 13:49
@codecov
Copy link

codecov bot commented Sep 26, 2022

Codecov Report

Base: 97.35% // Head: 97.42% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (7431376) compared to base (5b5ae2f).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #66      +/-   ##
==========================================
+ Coverage   97.35%   97.42%   +0.06%     
==========================================
  Files           9       10       +1     
  Lines         606      621      +15     
  Branches       63       63              
==========================================
+ Hits          590      605      +15     
  Misses          5        5              
  Partials       11       11              
Impacted Files Coverage Δ
aemcmc/transforms.py 100.00% <ø> (ø)
aemcmc/scale_mixtures.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rlouf
Copy link
Member Author

rlouf commented Sep 26, 2022

While the forward test passes this shouldn't be merged.

For the backward transformation, many-to-one, I need to select a rng, dtype and size for the Half-Cauchy. I choose the inner Inverse-Gamma (arbitrary for now), so I don't need to consider the parameters of the outer Inverse-Gamma. I am not sure if the following way of writing this in miniKanren is correct to support both ways:

lany(
    eq(rng_outer_lv, rng_lv),
    succeed,
),
lany(
    eq(size_outer_lv, size_lv),
    succeed,
),
lany(
    eq(type_idx_outer_lv, type_idx_lv),
    succeed,
)

@rlouf
Copy link
Member Author

rlouf commented Sep 26, 2022

More concerning to me is how to handle the rngs in these one-to-many relationship. I currently set the rng to both InverseGammaRV to the Half-Cauchy's, but that can't be correct. I don't see how this could play nicely with the updates system. One solution is setting it to aesara.shared(np.random.default_rng()) but then it becomes non-deterministic even if the caller sets a seed.

Let's walk throught a simpler example and let's consider another kind of expansion, the convolution of two normal distributions. In particular, let's consider the relation that represents the sum of two normally-distributed random variables:

$$ \begin{equation*} \frac{ X \sim \operatorname{N}\left(\mu_x, \sigma_x^2\right), \quad Y \sim \operatorname{N}\left(\mu_y, \sigma_y^2\right), \quad X + Y = Z }{ Z \sim \operatorname{N}\left(\mu_x + \mu_y, \sigma_x^2 + \sigma_y^2\right) } \end{equation*} $$

Let's print the current etuplized version of the contracted expression:

import aesara.tensor as at
from etuples import etuplize


srng = at.random.RandomStream(0)
mu_x = at.scalar()
sigma2_x = at.scalar()
mu_y = at.scalar()
sigma2_y = at.scalar()

X_rv = srng.normal(mu_x + mu_y, sigma2_x + sigma2_y)
pprint(etuplize(X_rv))
# e(
#   e(aesara.tensor.random.basic.NormalRV, 'normal', 0, (0, 0), 'floatX', False),
#   RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FD6BBEA6960>),
#   TensorConstant{[]},
#   TensorConstant{11},
#   e(
#     e(
#       aesara.tensor.elemwise.Elemwise,
#       <aesara.scalar.basic.Add at 0x7fd6c0e014e0>,
#       <frozendict {}>),
#     <TensorType(float64, ())>,
#     <TensorType(float64, ())>),
#   e(
#     e(
#       aesara.tensor.elemwise.Elemwise,
#       <aesara.scalar.basic.Add at 0x7fd6c0e014e0>,
#       <frozendict {}>),
#     <TensorType(float64, ())>,
#     <TensorType(float64, ())>))

And the expanded version of this expression:

Y_rv = srng.normal(mu_x, sigma2_x) + srng.normal(mu_y, sigma2_y)
pprint(etuplize(Y_rv))
# e(
#   e(
#     aesara.tensor.elemwise.Elemwise,
#     <aesara.scalar.basic.Add at 0x7fd6c0e014e0>,
#     <frozendict {}>),
#   e(
#     e(
#       aesara.tensor.random.basic.NormalRV,
#       'normal',
#       0,
#       (0, 0),
#       'floatX',
#       False),
#     RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FD6BD33A880>),
#     TensorConstant{[]},
#     TensorConstant{11},
#     <TensorType(float64, ())>,
#     <TensorType(float64, ())>),
#   e(
#     e(
#       aesara.tensor.random.basic.NormalRV,
#       'normal',
#       0,
#       (0, 0),
#       'floatX',
#       False),
#     RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FD6BD33BE60>),
#     TensorConstant{[]},
#     TensorConstant{11},
#     <TensorType(float64, ())>,
#     <TensorType(float64, ())>))

To express the relation between these two expressions with miniKanren we start by writing the following expressions with value variables:

from etuples import etuple
from kanren import var

mu_x, mu_y, sigma2_x, sigma2_y = var(), var(), var(), var()

rng, size, dtype = var(), var(), var()
X_et = etuple(
    etuplize(at.random.normal),
    rng,
    size,
    dtype,
    etuple(
        etuplize(at.add),
        mu_x,
        mu_y
    ),
    etuple(
        etuplize(at.add),
        sigma2_x,
        sigma2_y,
    )
)

rng_x, size_x, dtype_x = var(), var(), var()
rng_y, size_y, dtype_y = var(), var(), var()
Y_et = etuple(
    etuplize(at.add),
    etuple(
        etuplize(at.random.normal),
        rng_x,
        size_x,
        dtype_x,
        mu_x,
        sigma2_x
    ),
    etuple(
        etuplize(at.random.normal),
        rng_y,
        size_y,
        dtype_y,
        mu_y,
        sigma2_y
    )
)

The problem here stems from the fact that on the contracted version we need one rng state, and on the expanded one we need to distinct rng states. The expanded -> contracted transformation is possible, although necessitates to make a choice of which rng state to keep. We will keep the first one, since it was created first:

from kanren import lall, eq

# Our goal is a function of (contract_expr, expand_expr)
# We simplify the problem assuming all of the distribution parameters have the same =dtype= and =size=.
lall(
    eq(contract_exp, X_et),
    eq(expand_exp, Y_et),
    eq(rng, rng_x), # We pick the first state
    eq(size, size_x),
    eq(dtype, dtype_x),
)

However this will not work for the contracted -> expanded transformation as rng_y, dtype_y and size_y will not be assigned a value! The dtype and size add unrelated complications, so we will forget about them in the rest of the discussion, and will only consider the following question: what value should we choose for rng_y?

We know that we need to advance the state of the random number generator, or the computations downstream will be wrong. We also know that we cannot reset to np.random.default_rng() or we’re loosing determinism when the user seeds the graph. It is not clear to me what options we have left.

Now imagine that we have a symbolic operator RNGUpdate and at.random.update = RNGUpdate(). So that we can update the rng state doing:

rng_new = at.random.update(rng)

which corresponds to the evaluated etuple:

etuple(etuplize(at.random.update), rng)

We can now re-write Y_et as follows:

rng_x, size_x, dtype_x = var(), var(), var()
Y_et = etuple(
    etuplize(at.add),
    etuple(
        etuplize(at.random.normal),
        rng_x,
        size_x,
        dtype_x,
        mu_x,
        sigma2_x
    ),
    etuple(
        etuplize(at.random.normal),
        etuple(
            etuplized(at.random.update),
            rng_x
        ),
        size_y,
        dtype_y,
        mu_y,
        sigma2_y
    )
)

The following goal (leaving concerns of dtype and size aside for this discussion) now works:

from kanren import lany

# Our goal is a function of (contract_expr, expand_expr)
# We simplify the problem assuming all of the distribution parameters have the same =dtype= and =size=.
lall(
    eq(contract_exp, X_et),
    eq(expand_exp, Y_et),
    eq(rng, rng_x),
)

The solution may be to explicit the behind-the-scenes updating logic in Aesara with etuplize (how?), like we are planning to do with the multiple outputs in aesara-devs/aesara#1036. But like the latter case, this feels like the symptom of something running deeper in Aesara's internals.

@brandonwillard
Copy link
Member

brandonwillard commented Sep 27, 2022

More concerning to me is how to handle the rngs in these one-to-many relationship. I currently set the rng to both InverseGammaRV to the Half-Cauchy's, but that can't be correct. I don't see how this could play nicely with the updates system. One solution is setting it to aesara.shared(np.random.default_rng()) but then it becomes non-deterministic even if the caller sets a seed.
First, let's recap the situation at a high-level.

If the context requires a "strong" equivalence between a sub-graph and its rewritten form (e.g. sampling one must produce the exact same values as the other), then a lot of these rewrites are simply not appropriate (see aesara-devs/aesara#209); however, when it comes to generating samplers—as we're doing via these rewrites—then we're operating under considerably weaker equivalence assumptions, and these rewrites generally are appropriate. In this case, we only need consistency/reproducibility of the sampler's results between runs.

Let's print the current etuplized version of the contracted expression:
I think the kanren/etuple aspects may only end up complicating things, so let's consider the Aesara-only approach(es) to this first.

In some cases (e.g. ffbs_step), we've required that a user pass a RandomStream or seed value to the function(s) that generate a sampler. We may need to work this kind of thing into our general rewrites (e.g. some kind of global RandomStream that's accessible by rewrites), or start using the symbolic RNG constructors—i.e. use the original graphs' RNGs to seed new ones and effectively connect the user-specified state/seed to rewrite-generated ones.

The latter approach sounds like the best right now, so let's see what we can do along those lines first

@rlouf
Copy link
Member Author

rlouf commented Sep 27, 2022

In this case, we only need consistency/reproducibility of the sampler's results between runs.

Agreed.

I think the kanren/etuple aspects may only end up complicating things,

It does, but it also helps a lot to think about these things.

or start using the symbolic RNG constructors—i.e. use the original graphs' RNGs to seed new ones and effectively connect the user-specified state/seed to rewrite-generated ones.

Yes that was the idea. We may be able to make all this happen at the etuplization/evaluation level, in the same way we discussed adding a nth operator in the etuplized version of Ops.

The latter approach sounds like the best right now, so let's see what we can do along those lines first.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants