Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sampling from distributions with integer-valued parameters (e.g. MvNormal and Dirichlet) #1262

Merged
merged 2 commits into from
Jan 22, 2021

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Jan 21, 2021

This PR fixes and tests sampling from Dirichlet with integer-valued parameters which is broken in 0.24.11 and not tested currently:

julia> rand(Dirichlet(3, 4))
ERROR: InexactError: Int64(2.581230906529874)
Stacktrace:
 [1] Int64 at ./float.jl:710 [inlined]
 [2] convert at ./number.jl:7 [inlined]
 [3] setindex! at ./array.jl:847 [inlined]
 [4] rand!(::Random._GLOBAL_RNG, ::Gamma{Float64}, ::Array{Int64,1}) at /home/david/.julia/packages/Distributions/KDTM8/src/univariates.jl:165
 [5] _rand!(::Random._GLOBAL_RNG, ::Dirichlet{Int64,FillArrays.Fill{Int64,1,Tuple{Base.OneTo{Int64}}},Float64}, ::Array{Int64,1}) at /home/david/.julia/packages/Distributions/KDTM8/src/multivariate/dirichlet.jl:162
 [6] rand at /home/david/.julia/packages/Distributions/KDTM8/src/multivariates.jl:76 [inlined]
 [7] rand(::Dirichlet{Int64,FillArrays.Fill{Int64,1,Tuple{Base.OneTo{Int64}}},Float64}) at /home/david/.julia/packages/Distributions/KDTM8/src/genericrand.jl:22
 [8] top-level scope at REPL[4]:1

IMO the main underlying problem is that rand falls back to the in-place method rand! and uses a heuristic to determine the type of the samples. This problem seems similar to the one described (and fixed) for (log)pdf in #1257. However, I did not want to make any major to Distributions in this PR and therefore it is just a fix for the problem with Dirichlet.

Edit: This PR also fixes #1004 and similar issues for other distributions by using containers with floating point numbers for sampling from continuous distributions in the default implementation of rand. E.g, the PR fixes the following example as well:

julia> d = MvNormal([0, 0], [1, 1]);

julia> rand(d)
ERROR: MethodError: no method matching randn(::Random._GLOBAL_RNG, ::Type{Int64})
Closest candidates are:
  randn(::Random.AbstractRNG, ::Type{T}, ::Tuple{Vararg{Int64,N}} where N) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:201
  randn(::Random.AbstractRNG, ::Type{T}, ::Integer, ::Integer...) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:204
  randn(::Random.AbstractRNG) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:38
  ...
Stacktrace:
 [1] randn!(::Random._GLOBAL_RNG, ::Array{Int64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:178
 [2] _rand! at /home/david/.julia/packages/Distributions/7Ifyw/src/multivariate/mvnormal.jl:275 [inlined]
 [3] rand at /home/david/.julia/packages/Distributions/7Ifyw/src/multivariates.jl:76 [inlined]
 [4] rand(::MvNormal{Int64,PDMats.PDiagMat{Int64,Array{Int64,1}},Array{Int64,1}}) at /home/david/.julia/packages/Distributions/7Ifyw/src/genericrand.jl:22
 [5] top-level scope at REPL[5]:1

@andreasnoack
Copy link
Member

I think this one is a bit different than the logpdf case. For the logpdf case, I believe the result type should be possible to compute from the inputs. However, we've struggled with rand since it's now clear what should determine the type of the random variates. I think we have a couple of open issues and PRs related to that problem. I believe almost all the continuous distributions return Float64 except for Normal.

In any case, I don't think this PR is the right fix since it "lies" about the element type. The element type usually refers to the type of the parameters and not the type of the sample space. I think the simplest fix would be to promote to floats in

function Dirichlet{T}(alpha::AbstractVector{T}; check_args=true) where T
if check_args && !all(x -> x > zero(x), alpha)
throw(ArgumentError("Dirichlet: alpha must be a positive vector."))
end
alpha0 = sum(alpha)
lmnB = sum(loggamma, alpha) - loggamma(alpha0)
new{T,typeof(alpha),typeof(lmnB)}(alpha, alpha0, lmnB)
end
. Alternative, we should change
_rand!(rng, s, Vector{eltype(s)}(undef, length(s)))
to versions for continuous and discrete distributions and hardcode for Float64 and Int respectively. I believe the former is the less controversial solution.

@devmotion
Copy link
Member Author

devmotion commented Jan 21, 2021

Thanks for your comment! I think you touch on different issues, I'll try to respond as briefly as possible.

However, we've struggled with rand since it's now clear what should determine the type of the random variates.

IMO the natural approach is to let users specify the type of the samples of the underlying RNG, which would be Float64 by default but could be changed to e.g. Float32 or BigFloat, analogously to rand(rng, T, size...) in base. AFAICT every sampling procedure at some point starts by sampling from the definitions of rand, randexp, or randn in Random. The samples from the distributions are obtained by transforming these most basic samples, and therefore the type of the random variates follows automatically from the type of these basic samples and the function that transforms them and involves the parameters of the distribution. Viewed this way, it seems also the result type of rand could be computed, similar to (log)pdf.

A while ago I mentioned this view also in a discussion about MeasureTheory in which AFAIK the support of a measure is encoded in its type. I think it can be difficult to know it in advance and could be avoided by specifying the type of the most basic sampling calls.

In any case, I don't think this PR is the right fix since it "lies" about the element type. The element type usually refers to the type of the parameters and not the type of the sample space.

I think it really boils down to the question what exactly eltype specifies in Distributions. In Random, the convention is that eltype specifies the type of the samples (see, e.g., https://docs.julialang.org/en/v1/stdlib/Random/#An-optimized-sampler-with-pre-computed-data). However, if one sticks with the interpretation that in Distributions eltype should specify the type of the parameters, then clearly you are right.

If it should be the type of the parameters (which is e.g. not enforced in the type of MvNormal but only its outer constructors), the main problem is that both use cases are mixed in Distributions since the implementation of rand uses eltype as a heuristic for the type of the samples.

I am aware of the many issues and discussions in Distributions regarding eltype and therefore tried to keep everything specific to Dirichlet to avoid any more general discussions 🙂

I think the simplest fix would be to promote to floats in

This would be a simple fix but the allocations seem wasteful and only necessary to fix the incorrect heuristic in rand.

Alternative, we should change to versions for continuous and discrete distributions and hardcode for Float64 and Int respectively.

I think this will break many use cases with non-standard parameter types that are handled by the current heuristic.

Taken together, I think the most natural and possibly appropriate fix would be to change the heuristic of the type of the random variates in the default implementation of rand from eltype(d) to float(eltype(d)) for continuous distributions.

@andreasnoack
Copy link
Member

I'm not that worried about the extra allocation. If it matters for a user then it would be easy for a user to change their code to construct the parameter vector with floating-point elements.

I think this will break many use cases with non-standard parameter types that are handled by the current heuristic.

I'd be surprised if that is really the case since you can't really rely on such behavior for Distributions in general. I'm curious how that would look like.

There is the problem with pretending that the variates follow the eltype that you might think the computation has been fully based on the element type. That would be an issue for Dirichlet where the sampling is based on the Gamma samplers where you have

julia> typeof(rand(Gamma(0.5f0)))
Float64

julia> typeof(rand(Gamma(0.5)))
Float64

julia> typeof(rand(Gamma(big(0.5))))
BigFloat

but all these numbers are based on the Float64 variates in

x = randn(rng)
. I'm not sure we should pretend that random number generation is generic.

@devmotion
Copy link
Member Author

devmotion commented Jan 21, 2021

If alpha is the integer-valued output of another function that the user can't control, it is not possible to avoid the allocations. In some experiments, I had to construct and evaluate thousands of different Dirichlet distributions on a 1000-dimensional probability simplex in a loop. It seems surprising to convert the parameters at construction if everything but sampling works fine and the only problem is the heuristic in

rand(rng::AbstractRNG, s::Sampleable{Multivariate}) =
_rand!(rng, s, Vector{eltype(s)}(undef, length(s)))

An example that would be broken by hardcoding Int and Float64 instead of using eltype(s) in this line would be:

julia> using Distributions, ForwardDiff

julia> s = MvNormal(ForwardDiff.Dual.(ones(10)));

julia> rand(s)
10-element Array{ForwardDiff.Dual{Nothing,Float64,0},1}:
  Dual{Nothing}(0.1666028548436855)
 Dual{Nothing}(-0.43914145214951167)
 Dual{Nothing}(-1.0691807830172526)
  Dual{Nothing}(1.8454427275951966)
  Dual{Nothing}(0.39196945140339634)
 Dual{Nothing}(-1.046448411408746)
 Dual{Nothing}(-0.7625542381972902)
  Dual{Nothing}(0.5384999518634571)
 Dual{Nothing}(-1.266646900649526)
 Dual{Nothing}(-0.37844015490701083)

julia> eltype(s)
ForwardDiff.Dual{Nothing,Float64,0}

Basically, anything where eltype(s) would be not Float64.

Instead I would suggest adding

 rand(rng::AbstractRNG, s::Sampleable{Multivariate,Continuous}) = 
     _rand!(rng, s, Vector{float(eltype(s))}(undef, length(s))) 

(and similarly for other rand calls). This seems like a minor change that

  • would solve the example in the OP,
  • would allow to use integer-valued parameters without any conversions and allocations at construction,
  • would allow to use eltype(::Type{<:Dirichlet{T}}) where T = T that actually reports the type of the parameters,
  • would avoid any substantial changes and general discussions of the eltype setup and sampling.

There is the problem with pretending that the variates follow the eltype that you might think the computation has been fully based on the element type.

I'm sorry, isn't this exactly what currently Distributions assumes in the implementation of rand? As I wrote before, clearly this is only a heuristic that fails e.g. for integer-valued parameters but it seems it could be fixed for the problem at hand by using float(eltype(s)) for continuous distributions.

but all these numbers are based on the Float64 variates in . I'm not sure we should pretend that random number generation is generic.

Sorry, I am not sure what these comments refer to.

x = randn(rng)

is exactly one of these "base methods" that I referred to above. My suggestion was to allow to specify the type of exactly (and only) these base variates, i.e., to change such calls to

    x = randn(rng, T)

where T can be specified by the user. The type of the random variates of the distribution of interest would then be determined both by T, the parameters of the distribution, and the transformations needed to turn x into a sample from the distribution. More concretely, the example above would correspond to the default case T = Float64 and other examples would be

julia> typeof(rand(Float32, Gamma(0.5f0)))
Float32

julia> typeof(rand(Float32, Gamma(0.5)))
Float64

julia> typeof(rand(Float32, Gamma(big(0.5))))
BigFloat

julia> typeof(rand(BigFloat, Gamma(0.5f0))) # requires https://github.com/JuliaLang/julia/pull/35111
BigFloat

julia> typeof(rand(BigFloat, Gamma(0.5))) # requires https://github.com/JuliaLang/julia/pull/35111
BigFloat

julia> typeof(rand(BigFloat, Gamma(big(0.5)))) # requires https://github.com/JuliaLang/julia/pull/35111
BigFloat

But it seems this discussion is a bit too general here and not needed to fix the problem in the OP.

Coming back to the PR, would it be OK to revert the eltype definition and instead add something like

 rand(rng::AbstractRNG, s::Sampleable{Multivariate,Continuous}) = 
     _rand!(rng, s, Vector{float(eltype(s))}(undef, length(s))) 

?

@devmotion
Copy link
Member Author

BTW I just checked, the suggested change of rand not only fixes the problem with Dirichlet described in the OP but also problems with other distributions such as the issue described for MvNormal in #1004. Currently,

julia> d = MvNormal([0, 0], [1, 1]);

julia> rand(d)
ERROR: MethodError: no method matching randn(::Random._GLOBAL_RNG, ::Type{Int64})
Closest candidates are:
  randn(::Random.AbstractRNG, ::Type{T}, ::Tuple{Vararg{Int64,N}} where N) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:201
  randn(::Random.AbstractRNG, ::Type{T}, ::Integer, ::Integer...) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:204
  randn(::Random.AbstractRNG) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:38
  ...
Stacktrace:
 [1] randn!(::Random._GLOBAL_RNG, ::Array{Int64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Random/src/normal.jl:178
 [2] _rand! at /home/david/.julia/packages/Distributions/7Ifyw/src/multivariate/mvnormal.jl:275 [inlined]
 [3] rand at /home/david/.julia/packages/Distributions/7Ifyw/src/multivariates.jl:76 [inlined]
 [4] rand(::MvNormal{Int64,PDMats.PDiagMat{Int64,Array{Int64,1}},Array{Int64,1}}) at /home/david/.julia/packages/Distributions/7Ifyw/src/genericrand.jl:22
 [5] top-level scope at REPL[5]:1

whereas with my suggestion

julia> d = MvNormal([0, 0], [1, 1]);

julia> rand(d)
2-element Array{Float64,1}:
  1.6521247327111197
 -0.2234559539354925

@andreasnoack
Copy link
Member

With these issues, it seems that there isn't a single best solution. There are trade-offs and most of the options have aspects that are annoying. So I'm trying to make sure that we consider some of the implications before applying a fix. A drawback with one solution could be the introduction of allocations and the drawback with another solution could be that it follows a path that pretends that random number generation follows the precision of the parameters. A third solution might make it impossible to create Dual random variates but I'd have to admit that I'm not easily convinced that it's of practical relevance to do so.

My suggestion was to allow to specify the type of exactly (and only) these base variates, i.e., to change such calls to

x = randn(rng, T)

Originally, I thought you were suggesting to use T=eltype(d) but I think I now understand your proposal. The default is T=Float64 and we should then start introducing such T arguments in all rand methods, right? I think I like that idea.

It wouldn't help with the issue mentioned in my previous post where you could end up with BigFloat variates that are based on Float64 draws which I think is quite unfortunate but maybe it's the smallest of the potential issues. I'd like that the variates have the precision used for the underlying draws but it seems tricky to handle. It might require something like what Adapt.jl does for arrays.

Anyway, to avoid that the perfect is the enemy of the good here, I guess we should go with your latest proposal.

@codecov-io
Copy link

codecov-io commented Jan 22, 2021

Codecov Report

Merging #1262 (27a9fc9) into master (0da0bcb) will decrease coverage by 0.54%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1262      +/-   ##
==========================================
- Coverage   81.78%   81.23%   -0.55%     
==========================================
  Files         117      115       -2     
  Lines        6565     6533      -32     
==========================================
- Hits         5369     5307      -62     
- Misses       1196     1226      +30     
Impacted Files Coverage Δ
src/matrixvariates.jl 76.81% <100.00%> (-14.74%) ⬇️
src/multivariate/mvnormal.jl 73.05% <100.00%> (ø)
src/multivariates.jl 70.66% <100.00%> (-10.92%) ⬇️
src/univariate/continuous/normal.jl 98.87% <100.00%> (ø)
src/univariates.jl 61.11% <100.00%> (-5.33%) ⬇️
src/mixtures/mixturemodel.jl 72.83% <0.00%> (-5.77%) ⬇️
src/common.jl 67.56% <0.00%> (-0.86%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0da0bcb...27a9fc9. Read the comment docs.

@devmotion
Copy link
Member Author

I applied the suggestion and added some tests for the uni- and multivariate cases (they revealed that currently rand(d, ::Dims) is broken). It seems that at least MatrixNormal does not define eltype and therefore always falls back to Float64.

@devmotion
Copy link
Member Author

I'd like that the variates have the precision used for the underlying draws but it seems tricky to handle.

Yes, I assume ideally internally one would promote the user-specified precision based on the involved parameters.

@devmotion devmotion changed the title Fix sampling from Dirichlet Fix sampling from distributions with integer-valued parameters (e.g. MvNormal and Dirichlet) Jan 22, 2021
@andreasnoack andreasnoack merged commit 863844c into JuliaStats:master Jan 22, 2021
@devmotion devmotion deleted the dw/dirichlet_fix branch January 22, 2021 14:04
@devmotion
Copy link
Member Author

Thank you for merging the PR! Is it possible to make a new release with the fix? I already bumped the version in this PR 🙁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sampling from MvNormal with integer arrays fails
3 participants