CHANGE: Poisson distribution return type, change from float to integer #1093

ndebuhr · 2021-01-27T00:08:53Z

Summary

As a discrete probability distribution, the intuitive return type for Poisson random variates would seemingly be an integer type covering the distribution's support. This output strategy is already employed by other discrete probability distributions in rand_distr, like the "Bernoulli-related" Binomial distribution, Geometric distribution, and Hypergeometric distribution.

Details

The Poisson distribution implementation would need to be revised, to return a u64 (if following the Binomial, Geometric, and Hypergeometric precedent), instead of the current Float return type - a breaking API change. In addition to the API change, the internal implementation/calculations would need some revisions.

Motivation

What is the motivation for this change?

Making this change could make the crate more intuitive and dependent programs cleaner. Fundamentally, Poisson random variates are unsigned integers, so representing them as signed floats seems unintuitive. To me, this seems like an API made to match an internal implementation, rather than an API to match real-world intuition.

Alternatives

The Poisson random variates can be either f64 or f32, but some basic type casting logic is nevertheless fairly straightforward. Any of the possible type casts require some thought around truncation, etc., but I don't see any major issues with the cast strategy.

Notes

Through some digging, I see that a u64 return type was the initial implementation #96, yet this was later modified c03e2c8. Are we calling this a "done deal" and settling on the float return type? If there's shared interest in revisiting this, I could take a stab at the PR.

The text was updated successfully, but these errors were encountered:

dhardy · 2021-01-27T08:42:55Z

See also #958.

vks · 2021-01-27T14:33:40Z

@dhardy I think most of the discussion is in #987.

As you noted, the reason we return an f64 is that this is the data type used internally. Forcing the output to be u64 would therefore be mostly cosmetic with the following disadvantages:

The returned number might not be representable as u64, introducing a panic or truncation we are currently avoiding.
The caller might want to have an f64 anyway, and would be forced to introduce a pointless conversion.

While I agree that we are leaking an implementation detail here, I'm not convinced the more intuitive API outweighs those disadvantages. (Note that in a way it is unavoidable to leak implementation details via the return type: There are many different ways to represent a number, and we either have to pick or make the code generic.)

ndebuhr · 2021-01-27T22:09:46Z

Thanks for the replies! We are already making the code somewhat generic through the Poisson's Float use from num_traits. I wonder if we can refine/expand the generics implementation to get the best of both worlds. The Uniform distribution supports an extraordinarily wide variety of return types (and associated constructor argument types) - spanning both floats and integers. Is a similar strategy for Poisson an interesting possibility?

dhardy · 2021-01-28T09:06:49Z

In #987 the argument against a u64 impl is that we shouldn't have potentially panicking code in the lib — but it's a weak argument since it is difficult to do better.

We could instead have an impl returning Result<u64, f64> — but I believe type inference will not work well (assuming we also keep the plain f64 impl).

So maybe the best thing is to add a u64 impl back (alongside the f64 impl) which panics when out of range. Considering we already have multiple impls (for F: Float) it shouldn't impact type inference. @vks?

newpavlov · 2021-01-28T14:52:20Z

I also strongly believe that we should not have a spontaneously panicking impl. At the very least we should use something like let uint_res: u64 = float_res.try_into().unwrap_or(u64::MAX);. But IIRC TryInto<u64> is not implemented for f64, so we will have to write the conversion ourselves. Alternatively we could retry if a generated number falls out of u64 range.

ndebuhr · 2021-01-29T01:01:38Z

That's a really interesting idea - rerunning generation, in the case of a number falling outside the u64 range. I would bet some of the other distributions are using acceptance-rejection algorithms for random variate generation - that wouldn't be exactly precedent, but pretty close. The Poisson generation efficiency for astronomically high values of lambda would be very low in this scenario - a situation that seems very unlikely to occur deliberately, but it could by accident. Do we also have to worry about super high lambda values leading to "silent failures" or "hangs" - where a massive number of variates are required for each u64 compliant variate? Some folks might think this sort of silent failure is worse than a spontaneously panicking impl.

newpavlov · 2021-01-29T01:16:56Z

We could limit lambda to a certain "reasonable" range, e.g. to values smaller than usize::MAX or we can round it up to 10^20, thus achieving guarantee that in the worst case at least half of samples will fall into the desired range (for 10^20 this probability will be smaller, but probably reasonable enough). The distribution constructor can panic on values outside this range, so by documenting this condition the panic will not be spontaneous anymore.

dhardy · 2021-01-29T09:52:35Z

@newpavlov this is now the behaviour of as since 1.45.0

Out of range float to int conversions using as has been defined as a saturating conversion. This was previously undefined behaviour, but you can use the {f64, f32}::to_int_unchecked methods to continue using the current behaviour, which may be desirable in rare performance sensitive situations.

Alternatively we could retry if a generated number falls out of u64 range.

Why? That affects the distribution more. And if you're really generating u64 values that large frequently enough to observe too many u64::MAX values, then you should already know that you're pushing the limits of your numeric type (and this should be pretty rare given how large 2^64 - 1 is). So IMO this is a bad idea.

vks · 2021-02-04T16:10:52Z

We still need to make a decision. Personally, I would go with suggesting to use rng.sample(Poisson) as u64 (assuming Rust >= 1.45, so this is not undefined behavior), instead of implementing the distribution for u64 and possibly other integer types. This makes the caller responsible for how the lossy conversion should be done.

ndebuhr · 2021-02-09T12:59:04Z

Is there enough discussion and consensus that I should close the issue? Is there anything valuable that I could contribute or help with here? It seems like there is no obviously best solution, and leaving the Poisson as float seems reasonable to me.

vks · 2021-02-09T13:27:23Z

I think we should document our rationale for the return types somewhere. Maybe in the Rand book?

ndebuhr · 2021-02-18T13:21:48Z

Thanks team. I've submitted a PR for rationale documentation.
rust-random/book#34

ndebuhr mentioned this issue Feb 18, 2021

Document rationale for returning float types for a discrete distribution (Poisson) rust-random/book#34

Merged

ndebuhr closed this as completed Feb 21, 2021

vks mentioned this issue Jul 28, 2021

rand_distr: Add Zipf distribution #1136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGE: Poisson distribution return type, change from float to integer #1093

CHANGE: Poisson distribution return type, change from float to integer #1093

ndebuhr commented Jan 27, 2021

dhardy commented Jan 27, 2021

vks commented Jan 27, 2021 •

edited

Loading

ndebuhr commented Jan 27, 2021

dhardy commented Jan 28, 2021 •

edited

Loading

newpavlov commented Jan 28, 2021

ndebuhr commented Jan 29, 2021

newpavlov commented Jan 29, 2021 •

edited

Loading

dhardy commented Jan 29, 2021

vks commented Feb 4, 2021 •

edited

Loading

ndebuhr commented Feb 9, 2021

vks commented Feb 9, 2021 via email •

edited

Loading

ndebuhr commented Feb 18, 2021

CHANGE: Poisson distribution return type, change from float to integer #1093

CHANGE: Poisson distribution return type, change from float to integer #1093

Comments

ndebuhr commented Jan 27, 2021

Summary

Details

Motivation

Alternatives

Notes

dhardy commented Jan 27, 2021

vks commented Jan 27, 2021 • edited Loading

ndebuhr commented Jan 27, 2021

dhardy commented Jan 28, 2021 • edited Loading

newpavlov commented Jan 28, 2021

ndebuhr commented Jan 29, 2021

newpavlov commented Jan 29, 2021 • edited Loading

dhardy commented Jan 29, 2021

vks commented Feb 4, 2021 • edited Loading

ndebuhr commented Feb 9, 2021

vks commented Feb 9, 2021 via email • edited Loading

ndebuhr commented Feb 18, 2021

vks commented Jan 27, 2021 •

edited

Loading

dhardy commented Jan 28, 2021 •

edited

Loading

newpavlov commented Jan 29, 2021 •

edited

Loading

vks commented Feb 4, 2021 •

edited

Loading

vks commented Feb 9, 2021 via email •

edited

Loading