Generalization of abstract model functions #10

phipsgabler · 2021-02-01T14:28:01Z

phipsgabler
Feb 1, 2021
Maintainer

Here I'm trying to thing about the following questions:

What should the meaning of an AbstractModel be?
What interface should we give it, in terms of methods?

The complication is that all of us PPLs, and others, implicitely have a chain of the form Data -> Distribution -> (Log-)Probability, in which we use different terms; due to the different meanings of function arguments, some differentiate between partially applied forms, while some don't. And "Model" never denotes the same thing.

Let me try to summarize the some existing approaches first. Correct me if I'm wrong.

DynamicPPL.jl/Turing

In Turing language, an @model foo(...) block defines a "model function" that can be applied to parameters and returns a Model object containing the "evaluation function", can be called on VarInfo and Context.

The evaluation function can be called with specific Contexts, allowing the same Model to be reused to calculate several quantities, whose return value is written into the VarInfo argument. We have loglikelihood, a method taken from Distributions.jl, and logprior and logjoint, newly defined in DynamicPPL.jl.

Every LHS of a tilde statement that is a argument of the model function is treated as an observation, except when it (or contains) missing. Other than that, arguments can be arbitrary (except for the special ::Type{V} syntax) and are just closed over in the evaluator function.

DynamicPPL.jl's Model also just reuses the AbstractModel from AbstractMCMC.jl as a parent type. For a start, I think it's OK to reexport that type from AbstractPPL.jl and depend on AbstractMCMC.jl, as DynamicPPL.jl does. In the long term, IMO, the abstract type should go here, and the dependency reversed -- a model can, after all, be used for more general things than MCMC.

Models are inherently dynamic; thus, it does not make sense to query the variables or structure of a model object -- only the variables are available, and only through evaluation on a VarInfo.

Soss.jl

In Soss.jl, a foo @model ... = ... block defines a Soss.Model, which does not inherit from any abstract type, and represents the model in symbolic form. Applying a Model to data results in a JointDistribution, which is a Distribution.jl distribution over NamedTuples.

Because of this design, there is only one logpdf function, since you don't treat the same "model" in different ways depending on the arguments -- instead, you can take any model and transform generate a density function from it, which generalizes the notions of prior and likelihood. If I understand correctly, arguments of models are not special at all, but just define a model as a parametrized family of JointDistributions.

All models are static, thus structure and variables can be queried from a model.

Gen.jl

Gen.jl models define instances of subtypes of GenerativeFunction, which has a very strictly defined interface (the GFI), against which the rest of the system is programmed. A generative function is thereby to be interpreted as a parametrized distribution over traces ("choice maps").

Arguments do have special meaning as parameters, e.g. when optimizing them or taking gradients. But they are never "curried out" -- the GFI function all take an args argument.

"Model" is, I think, only used in a loose sense, and not as a programmatic term. There are two GFI implementions; the dynamic one operates only on run-time choice maps, and therefore behave more like DynamicPPL.jl with VarInfo, while the static one defines a graph that can be inspected.

StatsBase.jl

StatsBase.jl has a StatisticalModel base class, with some rather interesting functions around it, but seems to be way too specific to regression models, and concerned mostly with coefficients and model fitting.

It defines a loglikelihood function, but I think nobody uses that in PPLs (or is Distributions.jl extending from StatsBase.jl?)

cscherrer · 2021-02-01T15:34:58Z

cscherrer
Feb 1, 2021

Thanks @phipsgabler for the nice overview :)

The upcoming release of Soss will depend on MeasureTheory instead of Distributions (though you'll still be able to use that). The posterior density we (and all PPLs I know of) evaluate isn't normalized, so logpdf is not really accurate. So in MeasureTheory we have logdensity.

Going forward, I think a Soss model can become "just another parameterized measure". So some (at least) of this functionality can go into MeasureTheory, and Soss can just be a convenient tool box for combining Measures in more flexible ways.

2 replies

phipsgabler Feb 1, 2021
Maintainer Author

This sounds awesome.

In fact, I'm much in favour of an approach that treats the prior/likelihood/joint distinction purely as a program transformation between different kinds of models, and also the step from model to density. A workflow like

Write the joint model
derive a "likelihood model" with concrete data X
derive a density function, which is now the likelihood function

I feel like using this view at least in the abstractions can get us quite far in trying to intersect the abstractions of Turing, Soss, and Gen (modulo some different curryings for each).

cscherrer Feb 1, 2021

I agree, this is what I had in mind with my blog post last week. One very small point - I don't think I'd refer to the result as a likelihood.

femtomc · 2021-02-01T16:14:08Z

femtomc
Feb 1, 2021

@phipsgabler On Gen.jl -- I think at a high-level you've got it right.

There are more than just 2 GFI implementors, however. Each combinator implements the GFI (as each combinator can be thought of accepting a measure over a choice map space of one type and transforming it into a measure over a choice map space of a different type (e.g. a product type, or a sum type)).

In addition, there are quite a few deterministic modeling languages which all implement the GFI e.g. see GenPyTorch.jl.

The static one also programmatically defines a trace type which is specialized with respect to the graph. I think there are analogies between the static compiler and Turing components which generate a specialized VarInfo -- but I'm not sure exactly how this works on the Turing side.

3 replies

devmotion Feb 3, 2021
Maintainer

In the long term, IMO, the abstract type should go here, and the dependency reversed -- a model can, after all, be used for more general things than MCMC.

I think this should not be done - AbstractMCMC does and should provide an interface also for standalone packages that do not (and also don't want to) use any PPL. In my opinion, the following options would be better:

decouple the model types in both packages and just forward calls of sample etc with AbstractPPL.Model to a call with some AbstractMCMC.AbstractModel wrapper
remove the model type in AbstractMCMC - the definitions of StatsBase.sample in AbstractMCMC would still be no type piracy since it owns the sampler type
Create a separate lightweight package that contains abstract super types (and simple definitions?) that are useful for both AbstractMCMC and AbstractPPL and let these (and possibly other packages) only depend on this lightweight package (similar to e.g. SciMLBase)

cpfiffer Feb 3, 2021
Collaborator

Yeah, I like David's thinking here.

phipsgabler Feb 6, 2021
Maintainer Author

Oh heck, I have even come to the same conclusion before and forgot about it again. Totally agree.

Given that, I think a wrapper type is the least invasive to start with.

phipsgabler · 2021-02-09T16:35:44Z

phipsgabler
Feb 9, 2021
Maintainer Author

Here are interface comparisons between DynamicPPL, Soss, and (to some extent) Gen. I focus on how models are represented, sampled from, and densities calculated. @cscherrer & @femtomc, please check if I got things right.

This was with the goal to find commonalities; but the more I delve into the matter, the more I feel we are expressing things things differently in each and every case :D

DynamicPPL

Arguments are special -- they can be observed variables (or not, if missing), in addition to parameters.
A model is more or less a representation of multiple possible distributions (joint, prior, and likelihood) over VarInfo, automatically filled by the variables on the LHF of tilde statements. The return value does rarely play a role (or maybe is just neglected?).

DynamicPPL.@model function foo_dppl(Y, theta)
    X ~ Normal(0, theta)
    Y ~ Normal(X)
end

m = foo_dppl(y, theta)

m is a DynamicPPL.Model; foo_dppl is just a closure that builds Models.
rand(m): doesn't work
sample(m, Prior(), k): gives you a Chains object of length k from the prior -- basically a data frame with X drawn from the generative process
sample(m, algorithm, k): gives you a Chains approximating the posterior, using any of the sampling algorithms.
foo_dppl(missing, theta)(): calling the model object with missing as the observed argument runs the model as a generative process and returns the return value.
m(): also just runs the model body, but now Y is fixed, so this constantly returns y.
For density evaluation, there are logprior(::Model, ::VarInfo), and loglikelihood and logjoint with the same arguments calculate the respective quantities. The vi needs to be passed in and is being filled during evaluation (there, the internal state is stored and can be retrieved from; the same vi can be reused multiple times)

Soss

Arguments are not special -- the only serve as model parameters. Every parametrization produces then a JointDistribution -- a model with arguments fixed (@cscherrer what exactly is the meaning of a JointDistribution with only some arguments bound?). Observed values are specified later, through model transformations or in the inference functions.
A model represents both a joint distribution over NamedTuples of its internal state, as well as over its return value.

foo_soss = Soss.@model theta begin
       X ~ Normal(0, theta)
       Y ~ Normal(X)
end

m = foo_soss(theta = theta)

m is a JointDistribution (a Distribution over NamedTuples), while foo_soss is the Model -- this is contrary to Turing terminology!
rand(m, k): produces k samples of Y, i.e., it uses the model as a distribution over its return value.
sample(m, k): gives you k NamedTuples, (;X = ..., Y = ...), where all intermediate values are stored, too. I.e., it draws from the joint.
For density evaluation, you first need to transform the model, instantiate it, and then evaluate the logpdf:
```
julia> logpdf(likelihood(foo_soss, :Y)(X = 1), (; Y = 1))
-0.9189385332046728
```
The same holds for prior. This shows that there is not really a distinction between likelihood, prior, and joint at model level.
Posterior inference also builds on model transformations, and will usually use the lower level transformations internally.

Gen

I have much less of an understanding there.

@gen function foo_gen(theta)
    X ~ normal(0, theta)
    Y ~ normal(X)
end

foo_gen is now a generative function -- some object implementing the GFI, but still callable with the original function interface. So foo_gen(theta) behaves like a normal Julia function, i.e., draws from the joint and returns the last value.

The rest is much more trace-oriented: you sample choice maps using generate or simulate, which carry the weights of the sampled internal variables. Density calculation use these and the updating operations for choice maps, I think.

IIUC, there are two kinds of parameters, separate from random variables: function arguments with no special semantics, and variables declared with @param in the function body and used for optimization.

(McCoy, could you maybe provide some more details, analogous to the DPPL/Soss comparison?)

5 replies

cscherrer Feb 9, 2021

Thanks @phipsgabler for the nice summary.

* @cscherrer what exactly is the meaning of a `JointDistribution` with only some arguments bound?

This is just partial application. I want it to be natural to build models from within functions, call them in lots of different ways, etc. I'm not currently using this much, but the main benefit is for building closures.

* A model represents both a joint distribution over `NamedTuple`s of its internal state, as well as over its return value.

Yes, and return values are optional. If you leave it off, rand will just return the named tuple.

* `m` is a `JointDistribution` (a `Distribution` over `NamedTuple`s), while `foo_soss` is the `Model` -- this is contrary to Turing terminology!

Interesting, I didn't realize there was any difference here. Turing uses @model without specifying concrete values for inputs. Do you only call it a model once these are instantiated?

* For density evaluation, you first need to transform the model, instantiate it, and then evaluate the `logpdf`:
  ```julia
  julia> logpdf(likelihood(foo_soss, :Y)(X = 1), (; Y = 1))
  -0.9189385332046728
  ```

This is a little off. likelihood creates a new model, probably not what you want here. You've already insantiated m = foo_soss(theta = theta). So the posterior would be m | (; Y = 1). You'd evaluate it as

# Option 1
logdensity(m | (; Y = 1), (;X = 1))

Here the first argument is a distribution, and the second binds any free variables. So you could also write

# Option 2
logdensity(m | (X = 1, Y = 1))

or

# Option 3
logdensity(m, (X = 1, Y =1))

For most inference you'd end up doing Option 1, because X is fixed and you want to try lots of different values for Y.

* The same holds for `prior`. This shows that there is not really a distinction between likelihood, prior, and joint at model level.

This is not quite right. prior and likelihood build a new model, given one to start with and a specification of what will be observed:

julia> prior(foo_soss, :Y)
@model theta begin
        X ~ Normal(0, theta)
    end


julia> likelihood(foo_soss, :Y)
@model X begin
        Y ~ Normal(X, 1)
    end

cscherrer Feb 9, 2021

It's also probably worth mentioning that I'm hoping to make Soss.AbstractModel <: MeasureTheory.ParameterizedMeasure. So I think more and more we'll just be able to treat them and measures with a few more bells and whistles :)

phipsgabler Feb 12, 2021
Maintainer Author

Thanks @phipsgabler for the nice summary.

And thanks for your replies to all my writings!

This is just partial application. I want it to be natural to build models from within functions, call them in lots of different ways, etc. I'm not currently using this much, but the main benefit is for building closures.

Hm, it looks weird to me though, since the type does not change. And calling rand on a partially applied model produces an UndefVarError.

And how do you further apply the partially applied model then? foo_soss(first = x)(second = y) does not seem to work.

* A model represents both a joint distribution over `NamedTuple`s of its internal state, as well as over its return value.
Yes, and return values are optional. If you leave it off, rand will just return the named tuple.

Ah, the latter was unexpected. So presence of return completely changes the return type (usually)?

I actually thought rand on a return-less model would also just return the sampled value of the last expression. This is what DPPL does (or at least, did at a certain point); our tilde expressions should have the same expression semantics as assignments, so when the last statement is a tilde, it just evaluates to the sampled value.

* `m` is a `JointDistribution` (a `Distribution` over `NamedTuple`s), while `foo_soss` is the `Model` -- this is contrary to Turing terminology!
Interesting, I didn't realize there was any difference here. Turing uses @model without specifying concrete values for inputs. Do you only call it a model once these are instantiated?

Type-wise, yes -- foo_dppl is really just a function, and the produced m is the Model object. But of course actual usage is sloppy, and "model" is such an overloaded term that many will also speak of "the model foo_dppl".

This is a little off. likelihood creates a new model, probably not what you want here. You've already insantiated m = foo_soss(theta = theta). So the posterior would be m | (; Y = 1). You'd evaluate it as
# Option 1
logdensity(m | (; Y = 1), (;X = 1))

I see, that certainly looks nicer. What's the difference between m | (; Y = 1) and likelihood(foo_soss, :Y)(X = 1), then? I had assumed the former is just a shortcut, doing the metaprogramming (and generalized generated function) internally?

cscherrer Feb 12, 2021

Hm, it looks weird to me though, since the type does not change. And calling rand on a partially applied model produces an UndefVarError.

This is intentional. How would you expect it to work? I guess we could conceivably do whatever random sampling we can and have that as a bound argument in the remaining model?

And how do you further apply the partially applied model then? foo_soss(first = x)(second = y) does not seem to work.

Thanks, this is a bug. I haven't done much yet with partial application, so the machinery is incomplete, err, partially applied ;)

Ah, the latter was unexpected. So presence of return completely changes the return type (usually)?

I actually thought rand on a return-less model would also just return the sampled value of the last expression. This is what DPPL does (or at least, did at a certain point); our tilde expressions should have the same expression semantics as assignments, so when the last statement is a tilde, it just evaluates to the sampled value.

This is intended to be function-like, so the return value determines the type. If there's not a return, it gets a little tricky. "Last" doesn't make sense, because the model statements are not linearly ordered (they're a poset). So currently in that case we're just returning the entire internal state.

What's the difference between m | (; Y = 1) and likelihood(foo_soss, :Y)(X = 1), then?

The first model represents a posterior distribution conditional on the observed Y, so for example you could sample from it and get some X values.

likelihood(foo_soss, :Y) is a model that only represents the likelihood. So X is no longer random; it's now an argument that's required, and is treating as a constant in any evaluation of the density, or sampling, etc. So likelihood(foo_soss, :Y)(X = 1) is a distribution over Y only.

phipsgabler Feb 12, 2021
Maintainer Author

Hm, it looks weird to me though, since the type does not change. And calling rand on a partially applied model produces an UndefVarError.

This is intentional. How would you expect it to work? I guess we could conceivably do whatever random sampling we can and have that as a bound argument in the remaining model?

I'd have expected it to be of a different type, representing a function from the unbound variables to a complete distribution. And for rand to fail with a method error. But maybe that's too much type thinking for Julia.

Ah, the latter was unexpected. So presence of return completely changes the return type (usually)?
I actually thought rand on a return-less model would also just return the sampled value of the last expression. This is what DPPL does (or at least, did at a certain point); our tilde expressions should have the same expression semantics as assignments, so when the last statement is a tilde, it just evaluates to the sampled value.

This is intended to be function-like, so the return value determines the type. If there's not a return, it gets a little tricky. "Last" doesn't make sense, because the model statements are not linearly ordered (they're a poset). So currently in that case we're just returning the entire internal state.

Ah. My first reaction that was to try whether I can put several returns in a model and get a distribution over all of them :) But I see what you mean.

phipsgabler · 2021-02-16T17:26:21Z

phipsgabler
Feb 16, 2021
Maintainer Author

I was thinking whether it would maybe a good enough abstraction to treat DPPL models as "by default conditioned", and Soss models as "by default generative/joint", with operations to switch between those. Then all density evaluation and sampling can vary with a context argument, somewhat like this:

DynamicPPL.@model function foo_dppl(Y, theta)
    X ~ Normal(0, theta)
    Y ~ Normal(X)
end

m = foo_dppl(y, theta)
decondition(m) == foo_dppl(missing, theta)

# `m` is intrinsically a _conditioned model_, so prior/likehood/posterior are well-defined:
sample(m, PriorContext()) = rand(Normal(0, theta)
sample(m, LikelihoodContext((; X = ...))) = rand(Normal(X) # re-sampling Y
sample(m, PosteriorContext(), alg) = <approximately sample X | y using `alg`>

logdensity(m, PriorContext(), (; X = ...)) = logpdf(Normal(0, theta), X)
logdensity(m, LikelihoodContext(), (; X = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), y)

# but we can forget the conditioning to deal with the joint:
sample(decondition(m), JointContext()) = <sample from X, Y>
logdensity(decondition(m), JointContext(), (; X = ..., Y = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), Y)

# and we can treat model as a regular function:
rand(m) = <sampled value of last expression>

I have just invented that "deconditioning" term; it's a transformation of model semantics, not of structure! Maybe a better term exists.

(And now I'm deliberately ignoring the Model/JointDistribution abstraction, I know...)

foo_soss = Soss.@model theta begin
       X ~ Normal(0, theta)
       Y ~ Normal(X)
end

m = foo_soss(theta = theta)
m == decondition(m | (; Y = y))

# now, `m` is intrinsically a _joint_ model:
sample(m, JointContext()) = <sample from X, Y>
logdensity(m, JointContext(), (; X = ..., Y = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), Y)

# but after conditioning, prior/likehood/posterior become well-defined:
sample(m | (; Y = y), PriorContext()) = rand(Normal(0, theta))
sample(m | (; Y = y), LikelihoodContext((; X = ...))) = rand(Normal(X)) # re-sampling Y
sample(m | (; Y = y), PosteriorContext(), alg) = <approximately samply X | y using `alg`>

logdensity(m | (; Y = y), PriorContext(), (; X = ...)) = logpdf(Normal(0, theta), X)
logdensity(m | (; Y = y), LikelihoodContext(), (; X = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), y)

# and we can treat model as a regular function:
rand(m) = <return value, if there is one, otherwise joint NamedTuple>

Soss's conditioning and transformation capabilities are still more flexible, of course. E.g.,

logdensity(m | (; X = x, Y = y), PriorContext(), ()) == logdensity(m, JointContext(), (; X = ..., Y = ...))

Finally, there's this weirdly named loglikelihood(::Model, ::VarInfo) method and its friends which both sample and return likelihoods. Maybe that should be called generate, if I understand Gen semantics right?

5 replies

phipsgabler Feb 22, 2021
Maintainer Author

I realize I was being inconsistent. It should always be the case that logdensity(m, ctx, sample(m, ctx)) is compatible. So we should have

logdensity(m_dppl, LikelihoodContext(), (; X = ...)) = logpdf(Normal(X), y)
sample(m, LikelihoodContext((; X = ...))) = rand(Normal(X) # re-sampling Y

and

logdensity(m_soss | (; Y = y), LikelihoodContext(), (; X = ...)) = logpdf(Normal(X), y)
sample(m | (; Y = y), LikelihoodContext((; X = ...))) = rand(Normal(X)) # re-sampling Y

But then we need some way to evaluate the unnormalized posterior. Either just as logprior(m, (; X = ...)) + loglikelihood(m, (; X = ...)), or with a separate context.

cscherrer Feb 22, 2021

Could there be a way to encapsulate all of this context manipulation? It changes the interepretation of the model semantics, so it seems confusing to me, and like it could easily introduce errors.

I think it's much better to have a different object that has some semantics, and then have a simpler interface to logdensity that works in terms of that.

phipsgabler Mar 1, 2021
Maintainer Author

Could there be a way to encapsulate all of this context manipulation?

I imagined that there are different convenience functions like loglikelihood(m, x) = logdensity(m, LikelihoodContext(), x). And there could be logdensity(m, x) = logdensity(m, DefaultContext(m), x).

It changes the interepretation of the model semantics, so it seems confusing to me, and like it could easily introduce errors.

The whole point of this exercise is that Turing does already have changing model semantics. This seems like the more general case, whereas in Soss, semantics are fixed: a joint model would only implement the JointContext, and a conditional model the others. Or so I thought.

I think it's much better to have a different object that has some semantics, and then have a simpler interface to logdensity that works in terms of that.

Let's try. The more I think like this, the closer the result is to Soss :D

# A model does two things: 1. parametrization (theta), and 2. signifying the decomposition of the joint into a prior and likelihood, through observables (Y).
DynamicPPL.@model function foo_dppl(Y, theta)
    X ~ Normal(0, theta)
    Y[1] ~ Normal(X)
    Y[2] ~ Normal(X + 1)
end

m = foo_dppl(y, theta)

# Note that I'm slightly abusing named tuple syntax. This appears to be a symptom of a deeper, although less related, problem.

sample(prior(m)) = rand(Normal(0, theta))
sample(likelihood(m, (; X = ...))) = (; Y[1] = rand(Normal(X), Y[2] = rand(Normal(X + 1))) # re-sampling Y
sample(posterior(m), alg) = <approximately sample X | y using `alg`>
sample(joint(m)) = <sample from X, Y>

logdensity(prior(m), (; X = ...)) = logpdf(Normal(0, theta), X)
logdensity(likelihood(m, (; X = ...))) = logpdf(Normal(X), y[1]) + logpdf(Normal(X + 1), y[2])
logdensity(posterior(m), (; X = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), y[1]) + logpdf(Normal(X), y[2])
logdensity(joint(m), (; X = ..., Y = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), Y[1]) + logpdf(Normal(X), Y[2])

# alternative: `likelihood(m, p) == likelihood(m)(p)`?

# what about `missing`?
m2 = foo_dppl([y1, missing], theta)

sample(prior(m2)) = (; X = rand(Normal(0, theta), Y[1] = rand(Normal(X)))
sample(likelihood(m2, (; X = ..., Y[1] = ...))) = rand(Normal(X)) # re-sampling Y[2]
sample(posterior(m2), alg) = <approximately sample X, Y[2] | y[1] using `alg`>

logdensity(likelihood(m2, (; X = ..., Y[1] = ...))) = logpdf(Normal(X + 1), y[2])
logdensity(posterior(m2), (; X = ..., Y[1] = ..., Y[2] = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), Y[1]) + logpdf(Normal(X + 1), Y[2])

I don't think this is bad. It really looks nicer. And Usually you don't convert between different interpretations/contexts. We could just back-reference every "density" to the original model to solve that problem, just like Soss's JointDistribution does.

cscherrer Mar 1, 2021

This is getting interesting :)

I had been a little worried about my previous decision for likelihood to return a new model, and your code exposes some things that are bad about it. People will hear "evaluate the log-density on the likelihood" or "sample from the likelihood" and get very confused.

femtomc May 18, 2021

Finally, there's this weirdly named loglikelihood(::Model, ::VarInfo) method and its friends which both sample and return likelihoods. Maybe that should be called generate, if I understand Gen semantics right?

This is approximately right -- but the Gen model objects expressed by the dynamic and static DSLs also accept model arguments (like a function call). generate is actually a very fundamental interface, and several of the other interfaces could be re-expressed as syntactic sugar over generate. If you examine the weight computed by generate here -- you'll see that using generate with "empty observations" just produces a sample from the internal proposal (which could and usually is the prior, but in Gen, there also is allowed other proposals). https://www.gen.dev/dev/ref/gfi/#Gen.assess is also a version of generate -- just made more efficient.

torfjelde · 2021-05-17T23:22:46Z

torfjelde
May 17, 2021
Maintainer

Example implementation of `condition` in todays DynamicPPL.jl

I'm going to post this here as I think it might be a useful example of implementing condition even in the existing code-base of DPPL. Seems like it could be relevant for the discussion.

The idea is to implement conditioning as a AbstractContext.

using DynamicPPL, Distributions

import DynamicPPL: AbstractContext, DefaultContext

struct ConditionedContext{Vars, Values, C<:AbstractContext} <: AbstractContext
    values::Values
    ctx::C
end

function ConditionedContext(values::NamedTuple{Vars}, ctx = DefaultContext()) where {Vars}
    ConditionedContext{Vars, typeof(values), typeof(ctx)}(values, ctx)
end


import DynamicPPL: tilde, VarName, getsym
# assume
@generated function tilde(rng, ctx::ConditionedContext{Vars}, sampler, right, left::VarName{sym}, inds, vi) where {Vars, sym}
    if sym in Vars
        # We observe and then reutrn the conditioned value.
        valuestmt = if inds === Tuple{}
            :(ctx.values.$(sym))
        else
            # `inds` is a tuple of tuples, with the innermost tuples representing indices.
            # Therefore we need to recursively construct the indexing behavior.
            # TODO: Clean up. Lenses pls.
            n = length(inds.parameters)
            expr = Expr(:ref)
            # Add variable.
            push!(expr.args, :(ctx.values.$(sym)))
            # Inner-most index.
            for j = 1:length(inds.parameters[1].parameters)
                push!(expr.args, :(inds[1][$j]))
            end
            for i = 2:n
                expr = Expr(:ref)
                for j = 1:length(inds.parameters[i].parameters)
                    push!(expr.args, :(inds[$i][$j]))
                end
            end

            expr
        end
        return quote
            left_value = $(valuestmt)
            # observe
            logp = tilde(ctx.ctx, sampler, right, left_value, vi)
            # return value to replicate `assume`
            return left_value, logp
        end
    else
        # We assume as per normal.
        return :(return tilde(rng, ctx.ctx, sampler, right, left, inds, vi))
    end
end

# observe
function tilde(ctx::ConditionedContext, sampler, right, left, vi)
    return tilde(ctx.ctx, sampler, right, left, vi)
end

tilde (generic function with 10 methods)

Let's give it a try:

@model function demo(x)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, √s)

    for i in eachindex(x)
        x[i] ~ Normal(m, √s)
    end

    return (; s, m, x)
end

m = demo([1.0])
ctx = ConditionedContext((s = 10.0, m = 2.0))
print(m(ctx))

(s = 10.0, m = 2.0, x = [1.0])

m = demo([missing])
ctx = ConditionedContext((s = 10.0, m = 2.0, x = [1.0]))
print(m(ctx))

(s = 10.0, m = 2.0, x = Union{Missing, Float64}[1.0])

More convenient condition model using `ContextualModel`

Having to explicitly make a ConditionedContext is a bit annoying, so let's make it possible to wrap a model in a "condition" operation. For this we need a ContextualModel.

NOTE: This is useful for more than just condition. SIDENOTE: In hindsight, this actually looks similar to the effect handler interface in pyro/numpyro (though how these effect-handlers are applied is quite different).

import Random
import DynamicPPL: AbstractContext, Model, AbstractProbabilisticProgram, AbstractSampler, AbstractVarInfo

using Setfield

struct ContextualModel{C<:AbstractContext, M<:Model} <: AbstractProbabilisticProgram
    ctx::C
    model::M
end

# Overloads similar to `Model`.
# - Whenever a `context` is present, we will replace it with `cm.ctx` and forward the call to `cm.model`.
# - Whenever a `context` is NOT present, we forward the call to `cm`.
function (cm::ContextualModel)(
    rng::Random.AbstractRNG,
    varinfo::AbstractVarInfo = VarInfo(),
    sampler::AbstractSampler = SampleFromPrior(),
    context::AbstractContext = DefaultContext(),
)
    # Use Setfield.jl to override the context wrapped in `cm.ctx`.
    ctx = cm.ctx
    return cm.model(rng, varinfo, sampler, @set(ctx.ctx = context))
end

function (cm::ContextualModel)(args...)
    return cm(Random.GLOBAL_RNG, args...)
end

# without VarInfo
function (cm::ContextualModel)(
    rng::Random.AbstractRNG,
    sampler::AbstractSampler,
    args...,
)
    return cm(rng, VarInfo(), sampler, args...)
end

# without VarInfo and without AbstractSampler
function (cm::ContextualModel)(rng::Random.AbstractRNG, context::AbstractContext)
    ctx = cm.ctx
    return cm.model(rng, VarInfo(), SampleFromPrior(), @set(ctx.ctx = context))
end

With that, a condition method can be implemented as:

condition(model, values) = ContextualModel(ConditionedContext(values), model)

condition (generic function with 1 method)

cm = condition(m, (s = 10.0, m = 2.0, x = [1.0]))

And it just works:

print(cm())

(s = 10.0, m = 2.0, x = Union{Missing, Float64}[1.0])

# Leaving `m` unspecified.
cm = condition(m, (s = 10.0, x = [1.0]));

print(cm())

(s = 10.0, m = 0.7513499587368989, Union{Missing, Float64}[1.0])

print(cm())

(s = 10.0, m = -2.013080388763954, Union{Missing, Float64}[1.0])

Thanks to @devmotion, We can even do away with missing (though it's a bit annoying):

# Specify `x` as `missing`.
# TODO: automate?
m = Model{(:x, )}(m.name, m.f, (x = Vector{Float64}(undef, 1), ), m.defaults, m.logπ)
cm = condition(m, (s = 1.0, m = 0.0))
print(cm())

(s = 1.0, m = 0.0, x = [0.04102514666586658])

Nested? For sure homie.

ccm = condition(condition(m, (m = 0.0, )), (s = 1.0, ));
print(ccm())

(s = 1.0, m = 0.0, x = Union{Missing, Float64}[0.03888218609147349])

55 replies

torfjelde May 18, 2021
Maintainer

Yep :)

Lovely!

Is that what we're calling it? I don't think it needs to be PPL-specific

The name is whatever, and we can change it, move it, etc. I just wanted to start a "playground". What I have in mind is PPL-specific (it's what I outlined with just having a list of x ~ Dist statements and working with that); this is not referring to the simplfications stuff that we've been chatting/you wrote about.

phipsgabler May 19, 2021
Maintainer Author

If anything, I’m hoping to understand how the abstract interface is different than the generative function one in Gen. There likely is some key difference — but the design requirements seem likely to motivate a similar set of interfaces and abstract structures. Perhaps @phipsgabler knows exactly what I’m wondering.

@femtomc AbstractPPL just comes from a different point of view. The purpose is to define the interface in as probabilistic terms as possible -- so we want to think about model instances as things that you can transform and evaluate. Gen, I feel, completely embraces being a trace-base PPL and hence includes more computational details and handling of choices than intended with AbstractPPL.

And I'm fully aware that I understand Gen least of the three, so I might have written something confus(ed|ing). But the way you write it, I think I meant the same thing but used words that mean something different to something more familiar with Gen...

Also I'm very impressed by how far you've gotten with the IR representation. Isn't this sort of what you wanted to do or similar to what you began @phipsgabler ?

@torfjelde To some extent. McCoy and I have some common history of exchanging ideas in that direction, but he's the only one of us that actually got to implement any of them :P

Also, let me emphasize one more thing: there is of course a secret agenda of refactoring DynamicPPL coupled to what I am doing here and there. If you look at my PR draft over there, I aim into a direction of saying that what @model produces is a "conditioned model" by default, and be more Soss-like. (I originally intended to have separate types, but tried to stay with one general one for now; anyway it's still a mess.) So the default context would already be doing this kind of work. Also, the @(log)prob macros can be gotten rid of that way.

cscherrer May 19, 2021

If you look at my PR draft over there, I aim into a direction of saying that what @model produces is a "conditioned model" by default, and be more Soss-like.

I approve ;)

Just to be sure I understand, as I read it this really has no implications for Soss that will impact the end-user. You can still write Soss code, and it will still generate code, etc. But some of the functions either used in model manipulation or called from the generated code might be extensions of functions with abstractions defined in AbstractPPL. Is that right?

Another question is what this might mean for MeasureTheory. A Soss model is <: AbstractMeasure and works in terms of a logdensity. The idea is that most of the tools for working with measures are in that library, and Soss models are a special case with some superpowers. So how does AbstractPPL play into all of this?

torfjelde May 19, 2021
Maintainer

To some extent. McCoy and I have some common history of exchanging ideas in that direction, but he's the only one of us that actually got to implement any of them :P

Haha, makes sense:)

Also, let me emphasize one more thing: there is of course a secret agenda of refactoring DynamicPPL coupled to what I am doing here and there.

Yeah, I think we ( myself in particular, e.g. the stuff about VarName in some other discussion:) ) should be careful to separate refactorings/improvements to DPPL vs. what should go in APPL. Of course thinking about what to do with DPPL and APPL is a useful use-case/exercise for checking whether a particular interface makes sense, but it does make the discussion difficult to follow.

Just to be sure I understand, as I read it this really has no implications for Soss that will impact the end-user. You can still write Soss code, and it will still generate code, etc. But some of the functions either used in model manipulation or called from the generated code might be extensions of functions with abstractions defined in AbstractPPL. Is that right?

This is at least what I've had in mind for APPL:) It's more about trying to make it possible for us to share work that we do amongst projects rather than trying to find a completely perfect PPL implementation.

Another question is what this might mean for MeasureTheory.

I need to look at MeasureTheory.jl again to be able to answer this properly 👍

phipsgabler May 21, 2021
Maintainer Author

Just to be sure I understand, as I read it this really has no implications for Soss that will impact the end-user. You can still write Soss code, and it will still generate code, etc. But some of the functions either used in model manipulation or called from the generated code might be extensions of functions with abstractions defined in AbstractPPL. Is that right?

Exactly! If someone is just a Soss user, APPL should have no implications at all. But if you switch between different PPLs, or have a couple of models from different ones, then it should be possible to say things like "OK, I take each of the models, condition(m, ...), and then calculate logdensity" -- i.e., we have chosen to use common vocabulary for some basic statistical operations.

Also, ideally, if you develop a new small PPL or a sampler, you should be able to conform this interface and get sampler/PPL cooperation for free.

femtomc · 2021-05-19T23:03:07Z

femtomc
May 19, 2021

In some sense, I'm coming at this from an even more abstract perspective (specifically, less concrete -- as it says nothing about implementation on hardware) -- what are the representational concepts which are required to express models and inference.

For example, model scope. When we discuss things like traces, we implicitly assume that this concept exists in our code -- so that we can label specific random choices and say that this random choice belongs to this model. How is this represented abstractly in an IR? On the one hand, we can just overload equality and agree that

x = rand(Normal(...))
y = rand(Normal(...))

implies the existence of two random variables (x, y) in model code. That's a design choice -- and it is a valid one! But other systems prefer to be even more transparent and introduce a special operation ~ or trace, which requires that the programmer specify an address.

Now, onto @phipsgabler discussion of "write once" data structures. Naturally, we can express this by disallowing usage of an address more than once, and we get to use equality in flexible ways

x = trace(:x, Normal(...))
x = trace(:y, Normal(...))

and the assignment semantics don't interact with the probabilistic ones.

This is rather easy to represent in an IR -- but when we move to nested models it becomes more difficult. Now, if you enter into a model sub-model, you might address that operation just like you address the random choices.

x = trace(:x, Normal(...))
y = trace(:y, SomeModelCall(args...))

Indeed, this is basically how Gen works right now. One problem is inlining the sub-model call into the top-level call and before transformations, like what might be required for delayed sampling or marginalization. In the context of Bayesian networks, this transformation is easy -- there we don't have scope, and we don't have to worry about random variable flow. Here, let's see we want to marginalize out :x but the choice from :x flows into SomeModelCall:

x = trace(:x, Categorical(...))
y = trace(:y, SomeModelCall(x, ...))

(Also here, you may take the separate between assignment and random choice even further than Gen currently does, and require that the symbol addresses differ from any of the existing slot names (e.g. LHS vars)) Here, I'm going to assume that the random variable flow piggybacks onto assignment.

Now, to express a transform like Marginalize(:x) -- we need to perform an inter-block (alternatively, interprocedural if there's not a distinction between blocks and procedure in your IR) transformation. That's most easily expressed by inlining the SomeModelCall into the IR of the toplevel model -- but now we have to correct the addresses inside SomeModelCall so as to respect the inlining. I think this is most easily expressed by using an IR which is like MLIR -- we attach compile-time constant attributes to blocks, and each operation (like trace) contains a "region" which has basic blocks associated with it (think of trace like a function call operation). Then the operation trace(:y, SomeModelCall(x, ...)) has a transformation which transfers the constant operation attribute HasNestedAddress(:y) to the blocks inside trace as part of the inlining.

I'm being longwinded on purpose -- my point being that there's really fundamental representation questions, before we even get to implementation, about how to represent models and what things we will support. The imperative model of Julia IR and associated compilation pipeline is good for a certain type of thing. But I think there's the potential to identify a simple "core IR" which supports what @phipsgabler is trying to get at.

4 replies

phipsgabler May 21, 2021
Maintainer Author

I am obviously the last person you need to convince of the advantages of domain specific IRs, but representation questions are not what we want to define or solve with APPL. Rather: what are the fundamental probabilistic/statistic operations and interfaces that a "probabilistic program" needs to support, and which can be implemented programatically.

femtomc May 21, 2021

In that case, why is APPL not just MeasureTheory?

Fundamentally, we're approximating distintegrations on measure. Isn't that the most general operation to begin with?

phipsgabler May 22, 2021
Maintainer Author

Ha, now there we are onto something :D If you'd ask me only personally, yes, AbstractPPL could absolutely be "MeasureTheory with specialized support for distributions over trace types (including dynamic ones)". I'd happily start from that angle if I were designing a system on the empty slate without time or compatibility constraints.

But then, it has been argued that this is too theoretical/abstract a perspective. Hence we don't make use of measure theory (except for calling the function logdensity instead of logpdf). And as for disintegration: I don't even have fully understood it myself :P Also, as there is not much precedence of defining whole cross-PPL interface in those terms, it would be too bold a step, IMHO (all the disintegration/measure based PPL frameworks are designed from ground up for that purpose, AFAIK).

Now, I think we would be totally open for more ideas from those fields, but only as long as whatever comes out is still expressible in "every-day probability" terms and intuition.

("We" here is mostly me and Hong for now, so CC: @yebai)

cscherrer May 22, 2021

it has been argued that this is too theoretical/abstract a perspective

Where? This sounds more like a misunderstanding of the package than anything. It's only "measure theory" because Measures.jl was taken; there's nothing abstract or theoretical about it, any more so than for PPL in general.

we don't make use of measure theory (except for calling the function logdensity instead of logpdf)

I've wondered about this, is this MeasureTheory.logdensity? If not, what's the plan for making them easy to use together?

Now, I think we would be totally open for more ideas from those fields, but only as long as whatever comes out is still expressible in "every-day probability" terms and intuition.

We might be coming at this from fundamentally different places. IMO the "advanced stuff" in math should be treated the same as "advanced stuff" in CS. Some of the code might depend on very advanced principles (in math or CS), but a deep understanding should not be required for casual use. At the same time, it's important to not limit advanced users. Things should be as powerful and high-performance as possible, with an easy and intuitive interface.

Generalization of abstract model functions #10

phipsgabler Feb 1, 2021 Maintainer

DynamicPPL.jl/Turing

Soss.jl

Gen.jl

StatsBase.jl

Replies: 6 comments · 74 replies

phipsgabler Feb 1, 2021 Maintainer Author

devmotion Feb 3, 2021 Maintainer

cpfiffer Feb 3, 2021 Collaborator

phipsgabler Feb 6, 2021 Maintainer Author

phipsgabler Feb 9, 2021 Maintainer Author

DynamicPPL

Soss

Gen

phipsgabler Feb 12, 2021 Maintainer Author

phipsgabler Feb 12, 2021 Maintainer Author

phipsgabler Feb 16, 2021 Maintainer Author

phipsgabler Feb 22, 2021 Maintainer Author

phipsgabler Mar 1, 2021 Maintainer Author

torfjelde May 17, 2021 Maintainer

Example implementation of condition in todays DynamicPPL.jl

More convenient condition model using ContextualModel

Nested? For sure homie.

torfjelde May 18, 2021 Maintainer

phipsgabler May 19, 2021 Maintainer Author

torfjelde May 19, 2021 Maintainer

phipsgabler May 21, 2021 Maintainer Author

phipsgabler May 21, 2021 Maintainer Author

phipsgabler May 22, 2021 Maintainer Author

phipsgabler
Feb 1, 2021
Maintainer

Replies: 6 comments 74 replies

phipsgabler Feb 1, 2021
Maintainer Author

devmotion Feb 3, 2021
Maintainer

cpfiffer Feb 3, 2021
Collaborator

phipsgabler Feb 6, 2021
Maintainer Author

phipsgabler
Feb 9, 2021
Maintainer Author

phipsgabler Feb 12, 2021
Maintainer Author

phipsgabler Feb 12, 2021
Maintainer Author

phipsgabler
Feb 16, 2021
Maintainer Author

phipsgabler Feb 22, 2021
Maintainer Author

phipsgabler Mar 1, 2021
Maintainer Author

torfjelde
May 17, 2021
Maintainer

Example implementation of `condition` in todays DynamicPPL.jl

More convenient condition model using `ContextualModel`

torfjelde May 18, 2021
Maintainer

phipsgabler May 19, 2021
Maintainer Author

torfjelde May 19, 2021
Maintainer

phipsgabler May 21, 2021
Maintainer Author

phipsgabler May 21, 2021
Maintainer Author

phipsgabler May 22, 2021
Maintainer Author