Fix #644 - treat missing and Vector{Missing} inputs as parameters #651

mohamed82008 · 2019-01-20T05:43:34Z

This PR fixes #644. This PR also allows mixing Real and missing in data which is a feature requested by @trappmartin in #544.

yebai · 2019-01-20T20:46:17Z

Thanks, @mohamed82008 an example to illustrate what syntax is changed in this PR will be very helpful for understanding the code.

mohamed82008 · 2019-01-21T11:07:14Z

Here are examples of what's now possible:

julia> using Turing

julia> @model gdemo(x) = begin
         s ~ InverseGamma(2,3)
         m ~ Normal(0, sqrt(s))
         x[1] ~ Normal(m, sqrt(s))
         x[2] ~ Normal(m, sqrt(s))
         return s, m
       end
gdemo (generic function with 2 methods)

julia> model = gdemo(fill(missing, 2));

julia> model.pvars
(:s, :m, :x)

and

julia> @model gdemo(x = Vector{Real}(undef, 2)) = begin
         s ~ InverseGamma(2,3)
         m ~ Normal(0, sqrt(s))
         x[1] ~ Normal(m, sqrt(s))
         x[2] ~ Normal(m, sqrt(s))
         return s, m
       end
gdemo (generic function with 2 methods)

julia> model = gdemo(missing);

julia> model.pvars
(:s, :m, :x)

and

julia> @model testmodel01(x) = begin
           x ~ Bernoulli(0.5)
           return x
       end
testmodel01 (generic function with 2 methods)

julia> model = testmodel01(missing);

julia> model.pvars
(:x,)

Note that if an input Vector{Missing} is given and a default value is defined in the @model definition, the latter will be ignored.

Another thing now possible is sampling missing data automatically. Notice that x is treated as a dvar but the missing values are replaced by sampled values.

julia> @model gdemo(x) = begin
         s ~ InverseGamma(2,3)
         m ~ Normal(0, sqrt(s))
         x[1] ~ Normal(m, sqrt(s))
         x[2] ~ Normal(m, sqrt(s))
         return s, m
       end
gdemo (generic function with 2 methods)

julia> x = Union{Real, Missing}[1.0, missing];

julia> model = gdemo(x);

julia> model.pvars
(:s, :m)

julia> sample(model, HMC(1000, 0.1, 5));
[HMC] Sampling...100% Time: 0:00:05
[HMC] Finished with
  Running time        = 5.188032614999971;
  Accept rate         = 0.997;
  #lf / sample        = 4.995;
  #evals / sample     = 6.995;
  pre-cond. metric    = [1.0].

julia> x
2-element Array{Union{Missing, Real},1}:
 1.0
 1.3189310726427184

mohamed82008 · 2019-01-22T02:05:20Z

Notice that all the cases above used to not work before, so this is technically a non-breaking change.

yebai · 2019-01-23T21:24:05Z

Notice that all the cases above used to not work before, so this is technically a non-breaking change.

Thanks, @mohamed82008. Since it involves some syntax changes, it's better to merge this during the hackathon this week.

mohamed82008 · 2019-01-26T08:07:17Z

Any comments on this one?

yebai · 2019-01-26T13:07:09Z

Any comments on this one?

Here is a summary of the discussion during yesterday's hackathon. For Arrays with pure Missing values or pure Real values, the current syntax boils down to sample from the generative process (aka the prior), or from the posterior distribution (i.e. prior x likelihood / const). This is fine.

However, for scenarios with mixed Missing and Real types, there is a possibility of the user writing a model like

@model gdemo(x, y) = begin
         s ~ InverseGamma(2,3)
         m ~ Normal(0, sqrt(s))
         x[1] ~ Normal(m, sqrt(s))
         x[2] ~ Normal(m, sqrt(s)) # L1
         y ~ Normal(x[2], sqrt(s))  # L2
       end


julia> x = Union{Real, Missing}[1.0, missing];

julia> model = gdemo(x);

There are two potential issues here. Firstly, when x[2] equals to missing, it's is a latent variable. Although we can still sample x[2] from its prior, only when x[2] is a leaf node in the corresponding graphical model, its value won't influence logpdf. In the above example, x[2] is an non-leaf node in the underlying graphical model and its value will have impact on the joint logpdf of the model through y.

Secondly, when x[2] is a Real, it is used as data on L1 and as hyper on L2. This could be a confusing syntax. This is because we don't have a clear sepration between hyperparameters and data. Perhaps an alternative could be

@model demo(data1, data2 ; hypers) = begin
end

@willtebbutt @xukai92 @cpfiffer

xukai92 · 2019-01-26T17:58:11Z

The syntax using keyword arguments for hyper-parameters looks good to me. It seems to me that as long as nothing specified as missing occurs on the RHS, models are defined and the use of missing is justified.

cpfiffer · 2019-01-26T18:05:43Z

Would hyperparameters work something like this, in that proposal?

@model gdemo(x; s=missing, m=missing) = begin
         s ~ InverseGamma(2,3)
         m ~ Normal(0, sqrt(s))
         x ~ Normal(m, sqrt(s))
end

Then, when I called this, I could either sample from the priors on s and m with gdemo(x) or pass in gdemo(x; s=2.5, m=3.25) if I knew the parameters? I think that sounds good. The syntactical divide between data on the left and parameters/hyperparameters on the right could lead to good form when writing models.

xukai92 · 2019-01-26T18:33:50Z

In the model your provide, the hyper-parameters should look like this

@model gdemo(x; a=2, b=3) = begin
         s ~ InverseGamma(a, b)
         m ~ Normal(0, sqrt(s))
         x ~ Normal(m, sqrt(s))
end

In your example, whether or not to providing s and m results in different graphical models.

Maybe it's also nice to have what you proposed. Let's see how other people think.

mohamed82008 · 2019-01-26T23:04:00Z

It seems to me that as long as nothing specified as missing occurs on the RHS, models are defined and the use of missing is justified.

@xukai92 Do you suggest to leave this at the courtesy of the user or to check for it in runtime? I don't think the latter is possible in the general case. The only other alternative is to completely switch off this feature.

@yebai If the last feature is controversial, I can remove it from this PR; it is literally just a branch in an if statement!

yebai · 2019-01-27T19:09:12Z

@xukai92 Do you suggest to leave this at the courtesy of the user or to check for it in runtime? I don't think the latter is possible in the general case. The only other alternative is to completely switch off this feature.

I think what Kai means is to perform this checking at runtime. This is probably a feature not many people need and could cause side effects if not used carefully. So I suggest:

keep this feature, but add a warning message
remove this feature, for now, we could add it back when we think of a better solution

mohamed82008 · 2019-01-27T21:28:16Z

I will remove it for now.

mohamed82008 requested review from willtebbutt, trappmartin and yebai January 20, 2019 08:45

mohamed82008 force-pushed the mt/missing_support branch from 137d79e to cafef71 Compare January 26, 2019 12:08

mohamed82008 added 3 commits February 3, 2019 11:53

fix #644

9c69bca

support mixed missing and Real data

c74e4bf

remove support for mixed missing and real data in a single vector

6f96076

mohamed82008 force-pushed the mt/missing_support branch from cafef71 to 6f96076 Compare February 3, 2019 01:28

mohamed82008 merged commit a7c52dc into master Feb 3, 2019

mohamed82008 mentioned this pull request Feb 9, 2019

Fix internals example in advanced docs #679

Merged

mohamed82008 deleted the mt/missing_support branch February 9, 2019 09:40

cpfiffer mentioned this pull request May 13, 2019

"sample from the prior" vs "sample from generative model' #786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #644 - treat missing and Vector{Missing} inputs as parameters #651

Fix #644 - treat missing and Vector{Missing} inputs as parameters #651

mohamed82008 commented Jan 20, 2019 •

edited

Loading

yebai commented Jan 20, 2019

mohamed82008 commented Jan 21, 2019

mohamed82008 commented Jan 22, 2019

yebai commented Jan 23, 2019

mohamed82008 commented Jan 26, 2019

yebai commented Jan 26, 2019 •

edited

Loading

xukai92 commented Jan 26, 2019

cpfiffer commented Jan 26, 2019

xukai92 commented Jan 26, 2019 •

edited

Loading

mohamed82008 commented Jan 26, 2019

yebai commented Jan 27, 2019 •

edited

Loading

mohamed82008 commented Jan 27, 2019

Fix #644 - treat missing and Vector{Missing} inputs as parameters #651

Fix #644 - treat missing and Vector{Missing} inputs as parameters #651

Conversation

mohamed82008 commented Jan 20, 2019 • edited Loading

yebai commented Jan 20, 2019

mohamed82008 commented Jan 21, 2019

mohamed82008 commented Jan 22, 2019

yebai commented Jan 23, 2019

mohamed82008 commented Jan 26, 2019

yebai commented Jan 26, 2019 • edited Loading

xukai92 commented Jan 26, 2019

cpfiffer commented Jan 26, 2019

xukai92 commented Jan 26, 2019 • edited Loading

mohamed82008 commented Jan 26, 2019

yebai commented Jan 27, 2019 • edited Loading

mohamed82008 commented Jan 27, 2019

mohamed82008 commented Jan 20, 2019 •

edited

Loading

yebai commented Jan 26, 2019 •

edited

Loading

xukai92 commented Jan 26, 2019 •

edited

Loading

yebai commented Jan 27, 2019 •

edited

Loading