Generalization of abstract model functions #10
Replies: 6 comments 74 replies
-
Thanks @phipsgabler for the nice overview :) The upcoming release of Soss will depend on MeasureTheory instead of Distributions (though you'll still be able to use that). The posterior density we (and all PPLs I know of) evaluate isn't normalized, so Going forward, I think a Soss model can become "just another parameterized measure". So some (at least) of this functionality can go into MeasureTheory, and Soss can just be a convenient tool box for combining Measures in more flexible ways. |
Beta Was this translation helpful? Give feedback.
-
@phipsgabler On Gen.jl -- I think at a high-level you've got it right. There are more than just 2 GFI implementors, however. Each combinator implements the GFI (as each combinator can be thought of accepting a measure over a choice map space of one type and transforming it into a measure over a choice map space of a different type (e.g. a product type, or a sum type)). In addition, there are quite a few deterministic modeling languages which all implement the GFI e.g. see GenPyTorch.jl. The static one also programmatically defines a trace type which is specialized with respect to the graph. I think there are analogies between the static compiler and Turing components which generate a specialized |
Beta Was this translation helpful? Give feedback.
-
Here are interface comparisons between DynamicPPL, Soss, and (to some extent) Gen. I focus on how models are represented, sampled from, and densities calculated. @cscherrer & @femtomc, please check if I got things right. This was with the goal to find commonalities; but the more I delve into the matter, the more I feel we are expressing things things differently in each and every case :D DynamicPPL
DynamicPPL.@model function foo_dppl(Y, theta)
X ~ Normal(0, theta)
Y ~ Normal(X)
end
m = foo_dppl(y, theta)
Soss
foo_soss = Soss.@model theta begin
X ~ Normal(0, theta)
Y ~ Normal(X)
end
m = foo_soss(theta = theta)
GenI have much less of an understanding there. @gen function foo_gen(theta)
X ~ normal(0, theta)
Y ~ normal(X)
end
The rest is much more trace-oriented: you sample choice maps using IIUC, there are two kinds of parameters, separate from random variables: function arguments with no special semantics, and variables declared with (McCoy, could you maybe provide some more details, analogous to the DPPL/Soss comparison?) |
Beta Was this translation helpful? Give feedback.
-
I was thinking whether it would maybe a good enough abstraction to treat DPPL models as "by default conditioned", and Soss models as "by default generative/joint", with operations to switch between those. Then all density evaluation and sampling can vary with a context argument, somewhat like this: DynamicPPL.@model function foo_dppl(Y, theta)
X ~ Normal(0, theta)
Y ~ Normal(X)
end
m = foo_dppl(y, theta)
decondition(m) == foo_dppl(missing, theta)
# `m` is intrinsically a _conditioned model_, so prior/likehood/posterior are well-defined:
sample(m, PriorContext()) = rand(Normal(0, theta)
sample(m, LikelihoodContext((; X = ...))) = rand(Normal(X) # re-sampling Y
sample(m, PosteriorContext(), alg) = <approximately sample X | y using `alg`>
logdensity(m, PriorContext(), (; X = ...)) = logpdf(Normal(0, theta), X)
logdensity(m, LikelihoodContext(), (; X = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), y)
# but we can forget the conditioning to deal with the joint:
sample(decondition(m), JointContext()) = <sample from X, Y>
logdensity(decondition(m), JointContext(), (; X = ..., Y = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), Y)
# and we can treat model as a regular function:
rand(m) = <sampled value of last expression> I have just invented that "deconditioning" term; it's a transformation of model semantics, not of structure! Maybe a better term exists. (And now I'm deliberately ignoring the foo_soss = Soss.@model theta begin
X ~ Normal(0, theta)
Y ~ Normal(X)
end
m = foo_soss(theta = theta)
m == decondition(m | (; Y = y))
# now, `m` is intrinsically a _joint_ model:
sample(m, JointContext()) = <sample from X, Y>
logdensity(m, JointContext(), (; X = ..., Y = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), Y)
# but after conditioning, prior/likehood/posterior become well-defined:
sample(m | (; Y = y), PriorContext()) = rand(Normal(0, theta))
sample(m | (; Y = y), LikelihoodContext((; X = ...))) = rand(Normal(X)) # re-sampling Y
sample(m | (; Y = y), PosteriorContext(), alg) = <approximately samply X | y using `alg`>
logdensity(m | (; Y = y), PriorContext(), (; X = ...)) = logpdf(Normal(0, theta), X)
logdensity(m | (; Y = y), LikelihoodContext(), (; X = ...)) = logpdf(Normal(0, theta), X) + logpdf(Normal(X), y)
# and we can treat model as a regular function:
rand(m) = <return value, if there is one, otherwise joint NamedTuple> Soss's conditioning and transformation capabilities are still more flexible, of course. E.g., logdensity(m | (; X = x, Y = y), PriorContext(), ()) == logdensity(m, JointContext(), (; X = ..., Y = ...)) Finally, there's this weirdly named |
Beta Was this translation helpful? Give feedback.
-
Example implementation of
|
Beta Was this translation helpful? Give feedback.
-
In some sense, I'm coming at this from an even more abstract perspective (specifically, less concrete -- as it says nothing about implementation on hardware) -- what are the representational concepts which are required to express models and inference. For example, model scope. When we discuss things like traces, we implicitly assume that this concept exists in our code -- so that we can label specific random choices and say that this random choice belongs to this model. How is this represented abstractly in an IR? On the one hand, we can just overload equality and agree that x = rand(Normal(...))
y = rand(Normal(...)) implies the existence of two random variables Now, onto @phipsgabler discussion of "write once" data structures. Naturally, we can express this by disallowing usage of an address more than once, and we get to use equality in flexible ways x = trace(:x, Normal(...))
x = trace(:y, Normal(...)) and the assignment semantics don't interact with the probabilistic ones. This is rather easy to represent in an IR -- but when we move to nested models it becomes more difficult. Now, if you enter into a model sub-model, you might address that operation just like you address the random choices.
Indeed, this is basically how Gen works right now. One problem is inlining the sub-model call into the top-level call and before transformations, like what might be required for delayed sampling or marginalization. In the context of Bayesian networks, this transformation is easy -- there we don't have scope, and we don't have to worry about random variable flow. Here, let's see we want to marginalize out
(Also here, you may take the separate between assignment and random choice even further than Gen currently does, and require that the symbol addresses differ from any of the existing slot names (e.g. LHS vars)) Here, I'm going to assume that the random variable flow piggybacks onto assignment. Now, to express a transform like I'm being longwinded on purpose -- my point being that there's really fundamental representation questions, before we even get to implementation, about how to represent models and what things we will support. The imperative model of Julia IR and associated compilation pipeline is good for a certain type of thing. But I think there's the potential to identify a simple "core IR" which supports what @phipsgabler is trying to get at. |
Beta Was this translation helpful? Give feedback.
-
Here I'm trying to thing about the following questions:
AbstractModel
be?The complication is that all of us PPLs, and others, implicitely have a chain of the form Data -> Distribution -> (Log-)Probability, in which we use different terms; due to the different meanings of function arguments, some differentiate between partially applied forms, while some don't. And "Model" never denotes the same thing.
Let me try to summarize the some existing approaches first. Correct me if I'm wrong.
DynamicPPL.jl/Turing
In Turing language, an
@model foo(...)
block defines a "model function" that can be applied to parameters and returns aModel
object containing the "evaluation function", can be called onVarInfo
andContext
.The evaluation function can be called with specific
Context
s, allowing the sameModel
to be reused to calculate several quantities, whose return value is written into theVarInfo
argument. We haveloglikelihood
, a method taken from Distributions.jl, andlogprior
andlogjoint
, newly defined in DynamicPPL.jl.Every LHS of a tilde statement that is a argument of the model function is treated as an observation, except when it (or contains)
missing
. Other than that, arguments can be arbitrary (except for the special::Type{V}
syntax) and are just closed over in the evaluator function.DynamicPPL.jl's
Model
also just reuses theAbstractModel
from AbstractMCMC.jl as a parent type. For a start, I think it's OK to reexport that type from AbstractPPL.jl and depend on AbstractMCMC.jl, as DynamicPPL.jl does. In the long term, IMO, the abstract type should go here, and the dependency reversed -- a model can, after all, be used for more general things than MCMC.Models are inherently dynamic; thus, it does not make sense to query the variables or structure of a model object -- only the variables are available, and only through evaluation on a
VarInfo
.Soss.jl
In Soss.jl, a
foo @model ... = ...
block defines aSoss.Model
, which does not inherit from any abstract type, and represents the model in symbolic form. Applying aModel
to data results in aJointDistribution
, which is a Distribution.jl distribution overNamedTuple
s.Because of this design, there is only one
logpdf
function, since you don't treat the same "model" in different ways depending on the arguments -- instead, you can take any model and transform generate a density function from it, which generalizes the notions of prior and likelihood. If I understand correctly, arguments of models are not special at all, but just define a model as a parametrized family ofJointDistribution
s.All models are static, thus structure and variables can be queried from a model.
Gen.jl
Gen.jl models define instances of subtypes of
GenerativeFunction
, which has a very strictly defined interface (the GFI), against which the rest of the system is programmed. A generative function is thereby to be interpreted as a parametrized distribution over traces ("choice maps").Arguments do have special meaning as parameters, e.g. when optimizing them or taking gradients. But they are never "curried out" -- the GFI function all take an
args
argument."Model" is, I think, only used in a loose sense, and not as a programmatic term. There are two GFI implementions; the dynamic one operates only on run-time choice maps, and therefore behave more like DynamicPPL.jl with
VarInfo
, while the static one defines a graph that can be inspected.StatsBase.jl
StatsBase.jl has a
StatisticalModel
base class, with some rather interesting functions around it, but seems to be way too specific to regression models, and concerned mostly with coefficients and model fitting.It defines a
loglikelihood
function, but I think nobody uses that in PPLs (or is Distributions.jl extending from StatsBase.jl?)Beta Was this translation helpful? Give feedback.
All reactions