Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong prediction results on multivariate params #1352

Closed
ogoid opened this issue Jul 9, 2020 · 12 comments · Fixed by #1357 or #1567
Closed

Wrong prediction results on multivariate params #1352

ogoid opened this issue Jul 9, 2020 · 12 comments · Fixed by #1357 or #1567

Comments

@ogoid
Copy link

ogoid commented Jul 9, 2020

The predict function outputs wrong results depending on how the multivariate parameters are constructed. I present below a simple linear regression problem which always converge fine, but whose predict results depend on how the coef parameter is constructed.

using Turing, Plots, StatsPlots
using Turing.Inference: predict

@model function simple_linear(x, y)
    intercept ~ Normal(0,1)

    ## this corrupts `predict` output
    coef ~ MvNormal(2, 1)

    ## this alternative also
    # coef ~ filldist(Normal(0,1), 2)

    ## but this version works fine
    # coef = Vector(undef, 2)
    # for i in axes(coef, 1)
    #     coef[i] ~ Normal(0,1)
    # end

    ## this works too
    # coef1 ~ Normal(0,1)
    # coef2 ~ Normal(0,1)
    # coef = [coef1, coef2]

    coef = reshape(coef, 1, size(x,1))

    mu = intercept .+ coef * x |> vec

    error ~ truncated(Normal(0,1), 0, Inf)

    y ~ MvNormal(mu, error)
end


# simple linear transformation
x = randn(2, 100)
y = [1 + 2 * a + 3 * b for (a,b) in eachcol(x)]

chain = sample(simple_linear(x, y), NUTS(), 1000)

# model converges fine
plot(chain) |> display
@show chain

p = predict(simple_linear(x, missing), chain)

# prediction correctness depends on how multivariate params were constructed
@show y[1]
@show p["y[1]"].value.data |> mean # should be close to y[1] above
@show p["y[1]"].value.data |> std # sould be close to 0.0

I'm trying this on Julia 1.4.1 with Turing 0.13.0.

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2020

I can confirm that I managed to re-produce the issue reported. Anyone knows what this might be related? @TuringLang/turing

@torfjelde
Copy link
Member

It seems like the following lines is causing the issue:

for vn in md[v].vns
vn_symbol = Symbol(vn)
if vn_symbol c.name_map.parameters

If a variable is a vector, then vn will just be the symbol for the vector rather than the symbols corresponding to the indices of the vector. And so the check vn_str ∈ c.name_map.parameters will result in false since vn_str will, in this particular case be "coef" while c.name_map.parameters contains "coef[1]" and "coef[2]". It seems like this is something that has been introduced as a result of some upstream changes, as this worked just fine when I originally implemented this functionality.

@devmotion @cpfiffer Do any of you have an idea of the "appropriate" functionality to use to ensure that we can only set the values that are present?

The following snippet demonstrates the issue:

julia> x = randn(2, 100);

julia> y = [1 + 2 * a + 3 * b for (a,b) in eachcol(x)];

julia> m = simple_linear(x, y);

julia> chain = sample(m, NUTS(), 1000);

julia> var_info = Turing.VarInfo(m);

julia> c = chain[1];

julia> v = :coef;

julia> md = var_info.metadata;

julia> vn = first(md[v].vns)
coef

julia> c.name_map.parameters
4-element Array{String,1}:
 "coef[1]"
 "coef[2]"
 "error"
 "intercept"

@cpfiffer
Copy link
Member

I think there needs to be a String(vn) function in DynamicPPL that adds indexing to the VarName. Calling Symbol(vn) maps to this:

https://github.com/TuringLang/DynamicPPL.jl/blob/275ccc8791be7d5dc4eb74b6e0c19de247f56d74/src/varname.jl#L72

I can't actually find where we've overloaded string(vn) to append the indexing -- we used to have that functionality, but now I cannot seem to find where it went.

It needs to be added back in so Symbol(vn) = Symbol(String(vn, all_parts = true)) = Symbol("coef[1]").

@torfjelde
Copy link
Member

Yeah exactly. I also can't seem to be able to find the previous overload 😕

@torfjelde
Copy link
Member

But it seems like the vns field is redundant now? Given the fact that it now seems to be VarName(v, ()) where v is the symbol corresponding to the random variable in the model, e.g. coef. Pretty sure vns used to be ["coef[1]", "coef[2]"], no?

@torfjelde
Copy link
Member

It's now fixed on tor/issue-1352 (I'll make a PR asap):

julia> using Turing
[ Info: Precompiling Turing [fce5fe82-541a-59a6-adf8-730c64b5f9a0]

julia> using Turing.Inference: predict

julia> @model function simple_linear(x, y)
           intercept ~ Normal(0,1)

           ## now works
           coef ~ MvNormal(2, 1)

           ## now works
           # coef ~ filldist(Normal(0,1), 2)

           ## but this version works fine
           # coef = Vector(undef, 2)
           # for i in axes(coef, 1)
           #     coef[i] ~ Normal(0,1)
           # end

           ## this works too
           # coef1 ~ Normal(0,1)
           # coef2 ~ Normal(0,1)
           # coef = [coef1, coef2]

           coef = reshape(coef, 1, size(x,1))

           mu = intercept .+ coef * x |> vec

           error ~ truncated(Normal(0,1), 0, Inf)

           y ~ MvNormal(mu, error)
       end;

julia> # simple linear transformation
       x = randn(2, 100);

julia> y = [1 + 2 * a + 3 * b for (a,b) in eachcol(x)];

julia> m = simple_linear(x, y);

julia> chain = sample(m, NUTS(), 1000);
┌ Info: Found initial step size
└   ϵ = 0.00625

julia> p = predict(simple_linear(x, missing), chain);

julia> # prediction correctness depends on how multivariate params were constructed
       @show y[1]
y[1] = -1.5372850579938522
-1.5372850579938522

julia> @show p["y[1]"].data |> mean # should be close to y[1] above
(p["y[1]"]).data |> mean = -1.537285141344423
-1.537285141344423

julia> @show p["y[1]"].data |> std # sould be close to 0.0
(p["y[1]"]).data |> std = 1.2985395670414417e-6
1.2985395670414417e-6

@devmotion
Copy link
Member

I can't actually find where we've overloaded string(vn) to append the indexing -- we used to have that functionality, but now I cannot seem to find where it went.

It needs to be added back in so Symbol(vn) = Symbol(String(vn, all_parts = true)) = Symbol("coef[1]").

An overload of string is not needed (actually, it was not defined intentionally when @phipsgabler refactored it) since it falls back to the output of show which is defined (as suggested if I understand you correctly).

@phipsgabler
Copy link
Member

phipsgabler commented Jul 28, 2020

Right, string falls back to show, which ought to already serialize out all parts of the indexing: string(@varname(x[1][2])) == "x[1][2]" (if it doesn't, it's a bug, but I can't see why it shouldn't). getsym is what returns just the name witout indexing as a symbol.

I only left in the Symbol conversion because I knew that it was used somewhere else. It'd be much more elegant, IMHO, to use VarNames directly in all places, and have show only for printing/debugging. Possibly with a more refined discussion about and publicly documented interface for subsumes.

yebai pushed a commit that referenced this issue Jul 30, 2020
* transitions_from_chain now compatible with upstream updates

* transitions_from_chain compatible with MCMCChains v4

* removed dot from copy

* added test for predict with model containing multivariate variable
@mgmverburg
Copy link

mgmverburg commented Mar 21, 2021

It's now fixed on tor/issue-1352 (I'll make a PR asap):

julia> using Turing
[ Info: Precompiling Turing [fce5fe82-541a-59a6-adf8-730c64b5f9a0]

julia> using Turing.Inference: predict

julia> @model function simple_linear(x, y)
           intercept ~ Normal(0,1)

           ## now works
           coef ~ MvNormal(2, 1)

           ## now works
           # coef ~ filldist(Normal(0,1), 2)

           ## but this version works fine
           # coef = Vector(undef, 2)
           # for i in axes(coef, 1)
           #     coef[i] ~ Normal(0,1)
           # end

           ## this works too
           # coef1 ~ Normal(0,1)
           # coef2 ~ Normal(0,1)
           # coef = [coef1, coef2]

           coef = reshape(coef, 1, size(x,1))

           mu = intercept .+ coef * x |> vec

           error ~ truncated(Normal(0,1), 0, Inf)

           y ~ MvNormal(mu, error)
       end;

julia> # simple linear transformation
       x = randn(2, 100);

julia> y = [1 + 2 * a + 3 * b for (a,b) in eachcol(x)];

julia> m = simple_linear(x, y);

julia> chain = sample(m, NUTS(), 1000);
┌ Info: Found initial step size
└   ϵ = 0.00625

julia> p = predict(simple_linear(x, missing), chain);

julia> # prediction correctness depends on how multivariate params were constructed
       @show y[1]
y[1] = -1.5372850579938522
-1.5372850579938522

julia> @show p["y[1]"].data |> mean # should be close to y[1] above
(p["y[1]"]).data |> mean = -1.537285141344423
-1.537285141344423

julia> @show p["y[1]"].data |> std # sould be close to 0.0
(p["y[1]"]).data |> std = 1.2985395670414417e-6
1.2985395670414417e-6

Hi, so I was running into some issues with predict exactly as described here, then found this thread which was very related.

It seems like this approach

coef = Vector(undef, 2)
for i in axes(coef, 1)
    coef[i] ~ Normal(0,1)
end

Does not actually work (anymore). The first post implied it used to work (presumably prior to this change?), which was odd.

I can confirm the other approaches do actually work. However, this particular approach (the one that doesn't work) is certainly most convenient in my case.
Because I actually have multilevel priors, which I now additionally pass in through a dict, and it is possible that I will have different initializations for certain priors, and hence the only easy way to initialize them, is through a for-loop.

As a side note, to try the model above with the for-loop prior initialization method, I had to make some additional changes too, because when I used the model exactly as given above, but then with the 3rd method uncommented, it gave me errors.
ERROR: LoadError: ArgumentError: type does not have a definite number of fields

So I fixed that error for now by simply not using the coef reshape, and doing:

mu = intercept .+ coef[1] * x[1,:] .+ coef[2]*x[2,:] |> vec

Does anyone have any idea why this method of constructing priors no longer works (to give the correct results) with the predict method?

@torfjelde
Copy link
Member

I can confirm that the following now has issues:

coef = Vector(undef, 2)
for i in axes(coef, 1)
    coef[i] ~ Normal(0,1)
end

Thank you @mgmverburg for bringing attention to this!

The issue comes down to

for vn in md[v].vns
vn_sym = Symbol(vn)
# Cannot use `vn_sym` to index in the chain
# so we have to extract the corresponding "linear"
# indices and use those.
# `ks` is empty if `vn_sym` not in `c`.
ks = MCMCChains.namesingroup(c, vn_sym)
because if:

  1. You use the above implementation, Symbol.(md[:coef].vns) is [Symbol("coef[1]"), Symbol("coef[2]")] and MCMCChains.namesingroup(c, Symbol("coef[1]")) is going to be empty.
  2. You instead use coef ~ MvNormal(2, 1), Symbol.(md[:coef].vns) is going to be [:coef] and then you get the correct call MCMCChains.namesingroup(c, :coef).

I'm not sure how this got through though. Either I completely forgot to check this implementation and the comment above saying ## but this version works fine was was maybe referring to how it worked fine in the past, or something changed somewhere upstream that broke it. But if I had to wager, I actually now think I just brainfarted back then. No matter I definitively messed up with not adding all the above cases to the test-suite. And looking now, I see that I even overwrote the previous model used for testing!!! Though that model actually worked so np, but still not intentional.

I got a meeting for the next hour, but I'll get this sorted ASAP afterwards 👍

@torfjelde torfjelde reopened this Mar 23, 2021
@torfjelde
Copy link
Member

torfjelde commented Mar 23, 2021

EDIT: The below "hotfix" should not be used anymore. This has been fixed in DynamicPPL@0.10.9.

Here's a "hotfix" for the issue:

using Turing
import Random

function Turing.Inference.transitions_from_chain(
    rng::Random.AbstractRNG,
    model::Turing.Model,
    chain::MCMCChains.Chains;
    sampler = DynamicPPL.SampleFromPrior()
)
    vi = Turing.VarInfo(model)

    chain_idx = 1
    transitions = map(1:length(chain)) do sample_idx
        # NEW! Using the "recent" improvement to `setval!` in to do the job + the change in `_setval!` below.
        DynamicPPL.setval!(vi, chain, sample_idx, chain_idx)
        model(rng, vi, sampler)

        # Convert `VarInfo` into `NamedTuple` and save
        theta = DynamicPPL.tonamedtuple(vi)
        lp = Turing.getlogp(vi)
        Turing.Inference.Transition(theta, lp)
    end

    return transitions
end

function DynamicPPL._setval_kernel!(vi::DynamicPPL.AbstractVarInfo, vn::DynamicPPL.VarName, values, keys)
    string_vn = string(vn)
    string_vn_indexing = string_vn * "["
    indices = findall(keys) do x
        string_x = string(x)
        return string_x == string_vn || startswith(string_x, string_vn_indexing)
    end
    if !isempty(indices)
        sorted_indices = sort!(indices; by=i -> string(keys[i]), lt=DynamicPPL.NaturalSort.natural)
        val = mapreduce(vcat, sorted_indices) do i
            values[i]
        end
        DynamicPPL.setval!(vi, val, vn)
        DynamicPPL.settrans!(vi, false, vn)
    else
        # NEW! If `vn` is not present in `keys`, i.e. no value was given, we assume it should be resampled.
        # Alternatively we can whether or not to resample or warn a keyword argument.
        DynamicPPL.set_flag!(vi, vn, "del")
    end
end

This requires one PR to Turing.jl and DynamicPPL.jl, but I'll try to get those up today.

bors bot pushed a commit to TuringLang/DynamicPPL.jl that referenced this issue Apr 3, 2021
Currently if one calls `DynamicPPL._setval!(vi, vi.metadata, values, keys)` , then only those values present in `keys` will be set, as expected, but the variables which are _not_ present in `keys` will simply be left as-is. This means that we get the following behavior:
``` julia
julia> using Turing

julia> @model function demo(x)
           m ~ Normal(0, 1)
           for i in eachindex(x)
               x[i] ~ Normal(m, 1)
           end
       end
demo (generic function with 1 method)

julia> m_missing = demo(fill(missing, 2));

julia> var_info_missing = DynamicPPL.VarInfo(m_missing);

julia> var_info_missing.metadata.m.vals
1-element Array{Float64,1}:
 0.7251417347423874

julia> var_info_missing.metadata.x.vals
2-element Array{Float64,1}:
 1.2576791054418153
 0.764913349211408
julia> var_info_missing.metadata.m.vals # ✓ new value
1-element Array{Float64,1}:
 0.0
julia> var_info_missing.metadata.x.vals # ✓ still the same value
2-element Array{Float64,1}:
 1.2576791054418153
 0.764913349211408

julia> m_missing(var_info_missing) # Re-run the model with new value for `m`

julia> var_info_missing.metadata.x.vals # × still the same and thus not reflecting the change in `m`!
2-element Array{Float64,1}:
 1.2576791054418153
0.764913349211408
```

_Personally_ I expected `x` to be resampled since now parts of the model has changed and thus the sample `x` is no longer representative of a sample from the model (under the sampler used).

This PR "fixes" the above so that you get the following behavior:
``` julia
julia> var_info_missing.metadata.x.vals
2-element Array{Float64,1}:
 1.2576791054418153
 0.764913349211408

julia> DynamicPPL.setval!(var_info_missing, (m = 0.0, ));

julia> var_info_missing.metadata.x.vals
2-element Array{Float64,1}:
 1.2576791054418153
 0.764913349211408

julia> m_missing(var_info_missing)

julia> var_info_missing.metadata.x.vals
2-element Array{Float64,1}:
 -2.0493130638394947
  0.3881955730968598
```

This was discoverd when debugging TuringLang/Turing.jl#1352 as I want to move `Turing.predict` over to using `DynamicPPL.setval!` and it also has consequences for `DynamicPPL.generated_quantities` which uses `DynamicPPL.setval!` under the hood and thus suffer from the same issue.


There's an alternative: instead of making this the default-behavior, we could add `kwargs...` to `setval!` which includes `resample_missing::Bool` or something. I'm also completely fine with a solution like that 👍
torfjelde added a commit that referenced this issue Apr 12, 2021
* predict now uses set_and_resample! introduced in recent DynamicPPL

* only attempt to set parameters in predict

* added some tests to cover the previous failure cases

* removed some redundant namespace specifier

* version bump

* Apply suggestions from code review

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

* bumped version for DPPL in test

* changed variable name in predict as per suggestion by @devmotion

* version bump

* disable failing test

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@torfjelde
Copy link
Member

Aight, so finally the issue has been resolved.
It unfortunately took quite a while because there were issues related to upgrade to Julia 1.6, etc. But should be good now:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants