UPDATED: Multinomial distribution rand does not support Float32 probability vectors #1738

nomadbl · 2023-06-18T16:46:27Z

See issue #1032
This is virtually a copy of #1033
I forked the code from @darsnack 's repositoty and rebased to master, and ran the tests.

Fixes #1032. Fixes #1733. Closes #1033.

…istributions. Tests still pass.

codecov-commenter · 2023-06-18T17:09:05Z

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.12 ⚠️

Comparison is base (6251044) 85.91% compared to head (1529164) 85.80%.

❗ Current head 1529164 differs from pull request most recent head 23d69d4. Consider uploading reports for the commit 23d69d4 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1738      +/-   ##
==========================================
- Coverage   85.91%   85.80%   -0.12%     
==========================================
  Files         142      142              
  Lines        8568     8579      +11     
==========================================
  Hits         7361     7361              
- Misses       1207     1218      +11

Impacted Files	Coverage Δ
src/samplers/multinomial.jl	`87.50% <100.00%> (+0.32%)`	⬆️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

devmotion

Thank you for the PR! I think it looks quite good but could be improved a bit. I made some comments below 🙂

devmotion · 2023-06-18T21:47:29Z

src/samplers/multinomial.jl

@@ -1,4 +1,4 @@
-function multinom_rand!(rng::AbstractRNG, n::Int, p::AbstractVector{Float64},
+function multinom_rand!(rng::AbstractRNG, n::Int, p::AbstractVector{<:Real},


AFAICT one has to update the body of the function as well to avoid type stability issues.

More concretely, it must not contain any Float64 literals.

Isn't is true that the output type of the function can only be x::AbstractVector{eltype(x)} ?
I'm adding the type stability tests and it looks ok but I wanted to make sure since I'm pretty new to Julia.

There are internal type instabilities which can cause slow downs.

For instance, rp might change its type inside of the loop.

I took care of this aspect.
While doing so I noticed that Binomial is restricted to using Float64, so it may be slower/more accurate than necessary.
This also would affect Multinomial.
This is not needed for me, but it can be a good addition in future.

devmotion · 2023-06-18T21:48:26Z

test/multivariate/multinomial.jl

@@ -1,6 +1,6 @@
 # Tests for Multinomial

-using Distributions, Random, StaticArrays
+using Distributions, Random, StaticArrays, ForwardDiff


Not needed for the changes below it seems:

Suggested change

using Distributions, Random, StaticArrays, ForwardDiff

using Distributions, Random, StaticArrays

I'll remove it happily, as I don't personally think it really belongs here.
But at least for completeness sake, here is the source of this addition:
#1033 (comment)

It seems it was added because someone wanted dual numbers to be tested as well - but later it was figured out that for other reasons even with the changes in the PR they don't work and the tests were removed again.

So it seems fine to remove this import.

devmotion · 2023-06-18T21:50:53Z

test/multivariate/multinomial.jl

+@test (rand(Multinomial(10, p)); true)
+@test (rand(Multinomial(10, convert.(Float32, p))); true)


Can we use better tests that test correctness instead of only "not erroring"? For instance by looping over Float64 and Float32 probabilities in the tests above?

I'm not sure I follow.
What kind of correctness do you mean?
As for type stability, I'm looking at adding some @inferred tests here for that.

That the samples follow the desired distribution - basically, the same tests as for Float64.

I'd prefer running the existing tests with Float32 as well over a few additional tests.

took care of this ✔️

src/samplers/multinomial.jl

devmotion · 2023-06-21T11:00:51Z

src/samplers/multinomial.jl

    i = 0
    km1 = k - 1

    while i < km1 && n > 0
        i += 1
        @inbounds pi = p[i]
        if pi < rp
-            xi = rand(rng, Binomial(n, pi / rp))
+            xi = Tx(rand(rng, Binomial(n, Float64(pi / rp))))


This is not needed, updating x[i] in the line below will convert to the correct type anyway; additionally, the constructor of Binomial (should) take care of promotions/conversions (and we just continue with using the type of p/rp):

Suggested change

xi = Tx(rand(rng, Binomial(n, Float64(pi / rp))))

xi = rand(rng, Binomial(n, pi / rp))

src/samplers/multinomial.jl

nomadbl · 2023-06-21T12:46:31Z

@devmotion
Thanks for the input.
Applying your suggestions and running the tests yields failures due to this:

Type stability: Error During Test at /home/lior/Distributions.jl/test/multivariate/multinomial.jl:216
  Got exception outside of a @test
  MethodError: no method matching Distributions.BinomialGeomSampler(::Int64, ::Float32)
  
  Closest candidates are:
    Distributions.BinomialGeomSampler(::Any, ::Any, ::Any)
     @ Distributions ~/Distributions.jl/src/samplers/binomial.jl:29
    Distributions.BinomialGeomSampler(::Int64, ::Float64)
     @ Distributions ~/Distributions.jl/src/samplers/binomial.jl:36

Or the equivalent message using Float16.

That was the reason I made the type conversion in Binomial(n, Float64(pi / rp)).

As to preferring the implicit over explicit type conversions, I was thinking it would improve performance (saving the type check at run time) but I can also see the point of not using hard coded types when possible. The only objection I guess would be that in 32bit systems the user would not use Float64 to begin with, so that's not a real issue right?

Type conversions implicitly at runtime covers more systems better Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

nomadbl · 2023-06-26T20:25:13Z

@devmotion
Putting out a reminder here. Everything should be good to go as far as I can tell.

devmotion · 2023-06-28T23:20:12Z

I'll have another look at the PR tomorrow 🙂

devmotion · 2023-06-30T14:56:29Z

I tried to simplify the tests a bit (e.g., replaced hard-coded Float64 and Float32 with loops over types and included @inferred in the existing tests instead of adding new tests). If tests pass (apart from Tracker+ReverseDiff), the PR looks good to me 🙂 Thanks a lot!

devmotion

Looks good to me, thank you!

mrazomej · 2023-06-30T19:24:40Z

I don't know if I'm wrong about this, but shouldn't the multinom_rand! return a Vector{Int32} if the probability vector is Vector{Float32}?

For example, if I do

using Distributions

rand(Normal(0f0, 1f0))

I get a Float32. But if I do

rand(Multinomial(1, Float32[0.3, 0.3, 0.2, 0.15, 0.05]))

I get a Vector{Int64}. And if I try

rand(Multinomial(Int32(1), Float32[0.3, 0.3, 0.2, 0.15, 0.05]))

I get an error

ERROR: MethodError: no method matching Multinomial{Float32, Vector{Float32}}(::Int32, ::Vector{Float32})

Closest candidates are:
  Multinomial{T, TV}(::Int64, ::TV) where {T<:Real, TV<:AbstractVector{T}}
   @ Distributions ~/.julia/packages/Distributions/GrN7f/src/multivariate/multinomial.jl:25

Stacktrace:
 [1] Multinomial(n::Int32, p::Vector{Float32}; check_args::Bool)
   @ Distributions ~/.julia/packages/Distributions/GrN7f/src/multivariate/multinomial.jl:34
 [2] Multinomial(n::Int32, p::Vector{Float32})
   @ Distributions ~/.julia/packages/Distributions/GrN7f/src/multivariate/multinomial.jl:28
 [3] top-level scope
   @ REPL[49]:1

devmotion · 2023-06-30T20:31:56Z

No, that would seem very surprising and unintended to me. Typically, Int32 mainly shows up on 32bit architectures where Int = Int32 but it does not correspond to and is not related to Float32 (eg, also on 32bit machines the most common floating point number type is Float64).

This also explains the error you see: The constructor is only defined for Int, so for Int64 on 64bit architectures and Int32 on 32bit.

mrazomej · 2023-06-30T20:40:47Z

I see; thank you very much for the clarification. I need to understand the differences better. Naively, I thought that working with 32-bit numbers would speed up my computations when doing something such as with Turing.jl or Flux.jl.

devmotion · 2023-06-30T20:52:35Z

Yes, probably you'll get a speedup from using Float32 (as long as there are no hardcoded Float64 literals in the called functions or other problems that cause automatic promotion to Float64) and you also want (have?) to use it on GPUs. But usually you don't want to use Int32 on 64bit architectures (if it is supported at all since many functions are defined for Int only which is Int64 on 64bit).

darsnack added 4 commits June 18, 2023 19:16

Allow probability vectors other than Float64 vector for Multinomial d…

e8a9fa0

…istributions. Tests still pass.

Remove explicit concrete types from multinom_rand

f5f8d5f

Added test to make sure rand works for real types other than Float64

654a293

Removed extra test

ce0895e

devmotion reviewed Jun 18, 2023

View reviewed changes

nomadbl added 4 commits June 19, 2023 19:08

updated tests for type stability

d7a2284

same test but not using internal function

0721c75

repeat tests under different types instead

eed61f7

type stability within multinomial. may want to extend into Binomial

0eabf8e

devmotion reviewed Jun 21, 2023

View reviewed changes

nomadbl and others added 2 commits June 21, 2023 15:49

Apply suggestions from code review

19aa408

Type conversions implicitly at runtime covers more systems better Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

delete leftover type variable

1529164

nomadbl requested a review from devmotion June 26, 2023 21:11

devmotion mentioned this pull request Jun 28, 2023

relax type for Multinomial sampling #1743

Closed

Simplify tests

23d69d4

devmotion approved these changes Jun 30, 2023

View reviewed changes

devmotion merged commit 55d4d6e into JuliaStats:master Jun 30, 2023
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPDATED: Multinomial distribution rand does not support Float32 probability vectors #1738

UPDATED: Multinomial distribution rand does not support Float32 probability vectors #1738

nomadbl commented Jun 18, 2023 •

edited by devmotion

Loading

codecov-commenter commented Jun 18, 2023 •

edited

Loading

devmotion left a comment

devmotion Jun 18, 2023

devmotion Jun 18, 2023

nomadbl Jun 19, 2023

devmotion Jun 19, 2023

devmotion Jun 19, 2023

nomadbl Jun 20, 2023

devmotion Jun 18, 2023

nomadbl Jun 19, 2023

devmotion Jun 21, 2023

devmotion Jun 18, 2023

nomadbl Jun 19, 2023

devmotion Jun 19, 2023

devmotion Jun 19, 2023

nomadbl Jun 20, 2023

devmotion Jun 21, 2023

nomadbl commented Jun 21, 2023

nomadbl commented Jun 26, 2023

devmotion commented Jun 28, 2023

devmotion commented Jun 30, 2023

devmotion left a comment

mrazomej commented Jun 30, 2023

devmotion commented Jun 30, 2023

mrazomej commented Jun 30, 2023

devmotion commented Jun 30, 2023

		@@ -1,4 +1,4 @@
		function multinom_rand!(rng::AbstractRNG, n::Int, p::AbstractVector{Float64},
		function multinom_rand!(rng::AbstractRNG, n::Int, p::AbstractVector{<:Real},

	using Distributions, Random, StaticArrays, ForwardDiff
	using Distributions, Random, StaticArrays

		@test (rand(Multinomial(10, p)); true)
		@test (rand(Multinomial(10, convert.(Float32, p))); true)

	xi = Tx(rand(rng, Binomial(n, Float64(pi / rp))))
	xi = rand(rng, Binomial(n, pi / rp))

UPDATED: Multinomial distribution rand does not support Float32 probability vectors #1738

UPDATED: Multinomial distribution rand does not support Float32 probability vectors #1738

Conversation

nomadbl commented Jun 18, 2023 • edited by devmotion Loading

codecov-commenter commented Jun 18, 2023 • edited Loading

Codecov Report

devmotion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nomadbl commented Jun 21, 2023

nomadbl commented Jun 26, 2023

devmotion commented Jun 28, 2023

devmotion commented Jun 30, 2023

devmotion left a comment

Choose a reason for hiding this comment

mrazomej commented Jun 30, 2023

devmotion commented Jun 30, 2023

mrazomej commented Jun 30, 2023

devmotion commented Jun 30, 2023

nomadbl commented Jun 18, 2023 •

edited by devmotion

Loading

codecov-commenter commented Jun 18, 2023 •

edited

Loading