Document why we use an open interval #351

pitdicker · 2018-03-28T11:36:58Z

The documentation on the Uniform implementation on floats was very hard to find in rustdoc, so i moved it to the documentation of Uniform itself.

Feel free to shoot my wording down 😄. But I hope this brings the intention across.

dhardy

The rest looks good to me.

dhardy · 2018-03-28T14:50:00Z

src/distributions/mod.rs

+/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No
+///   (or almost no) sensible code relies on an exact floating-point value to be
+///   generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for
+///   `f64`). What is relied on, is on the distribution to be uniform, and have


What is relied on is having a uniform distribution and a mean of 0.5.

Why median??

I was shuffling the names... But you are right, median is not the one.

Changed b.t.w., but the line is just below your comment.

Semantically correct but poor grammar, hence my suggestion, but never mind, this'll do.

A, now I get your comment!
This grammar is perfectly fine in Dutch though...

vks · 2018-03-30T13:01:38Z

src/distributions/mod.rs

+///
+/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No
+///   (or almost no) sensible code relies on an exact floating-point value to be
+///   generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for


1/2^23 is not really tiny, it corresponds to just 32 MiB of random data. I could be relevant to generate zero for fuzzing purposes.

Hmm, maybe better to rewords this a bit. Something like "1 in ~8 million (2^23) for f32". Not sure if that counts as tiny, I had mostly f64 in mind when writing that 😄.

It could be relevant to generate zero for fuzzing purposes.

Good point. Of course there are always some valid uses for [0, 1). In this case, fuzzing, we concluded in dhardy#83 that you are better of using some specific library and trait like Arbitrary instead of the old Rand trait or our distributions.

That point was more about converting to user types.

The chance of never getting 0 from 1 sample of n=2^23 choices is (n-1)/n or approx. 0.9999998807907104.

The chance of never getting 0 from k samples of n choices is ((n-1)/n)^k; if we take k=2^23 samples that translates to a probability of 37% of never sampling 0, so you might want to up your sample size a bit:

>>> ((2**23 - 1.0) / (2**23)) 0.9999998807907104 >>> ((2**23 - 1.0) / (2**23))**(2**23) 0.3678794192441178 >>> ((2**23 - 1.0) / (2**23))**(2**24) 0.13533526710338942 >>> ((2**23 - 1.0) / (2**23))**(2**25) 0.018315634521945755

This is assuming the f64 ops in Python are able to represent these computations with sufficient accuracy; I think they should be able to.

@dhardy I am not really sure what you mean with the comment above.

The chance to get zero with one try is 2^-23. But 2^23 tries does not guarantee you get zero, there is only a 37% chance it contains 0.0. A nice thing to realize 😄. Not something I should write somehow, right?

No, it was aimed at @vks (i.e. you shouldn't just take 32MiB and expect to get 0 — not unless you use a non-random counting generator or something, but then it's not fuzz testing any more, it's just complete testing, assuming you have a single f32 input parameter).

vks · 2018-03-30T13:03:26Z

A counter argument to open intervals is that it is trivial to get the open interval from a half-open one via rejection sampling.

pitdicker · 2018-03-30T13:40:45Z

A counter argument to open intervals is that it is trivial to get the open interval from a half-open one via rejection sampling.

Even faster: it is possible to convert between (0, 1) and [0, 1) by subtracting ε/2. Or to (0, 1] by adding ε/2. Maybe that is something worth mentioning in the documentation...

dhardy · 2018-03-31T09:08:15Z

Maybe that is something worth mentioning in the documentation...

Yes and no. These may be useful, but if users start doing rng.gen() - ε/2 (or + ε/2) in many places, it means we can never change what the Uniform distr. does here. So there is some value in alternate distributions (OpenClosed01 and ClosedOpen01).

pitdicker · 2018-03-31T11:58:49Z

So there is some value in alternate distributions (OpenClosed01 and ClosedOpen01).

Yes, I planned to write somewhere I am really not against adding something like ClosedOpen01. I do think an open distribution as we have now is the best choice for the default. But we were having the problem of an exploding number of choices, which is why I didn't yet like to see additions just for completeness.

pitdicker · 2018-03-31T16:53:33Z

I have replaced "tiny chance" with "very small" and added "~8 million", but am a bit done with tweaking this text to be honest...

vks · 2018-03-31T16:56:36Z

@dhardy I just calculated the number of samples needed to get an expectation value of 1 for the number of zeros. They point was that you get a high probability to generate a zero for plausible sample sizes.

…

On Sat, Mar 31, 2018, 18:23 Diggory Hardy ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/distributions/mod.rs <#351 (comment)> : > /// ``` /// +/// # Open interval for floats +/// In theory it is possible to choose between an open interval `(0, 1)`, and +/// the half-open intervals `[0, 1)` and `(0, 1]`. All can give a distribution +/// with perfectly uniform intervals. Many libraries in other programming +/// languages default to the closed-open interval `[0, 1)`. We choose here to go +/// with *open*, with the arguments: +/// +/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No +/// (or almost no) sensible code relies on an exact floating-point value to be +/// generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for No, it was aimed at @vks <https://github.com/vks> (i.e. you shouldn't just take 32MiB and expect to get 0 — not unless you use a non-random counting generator or something, but then it's not fuzz testing any more, it's just complete testing, assuming you have a single f32 input parameter). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#351 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACCtLqx1C2UdWJRf_ovpybo74GYhOL9ks5tj61ugaJpZM4S-e8m> .

dhardy · 2018-03-31T17:18:48Z

@vks an expected value of 1 does not imply high probability and this is a crazy argument for making gen() output 0 IMO.

Document why we use an open interval

dhardy reviewed Mar 28, 2018

View reviewed changes

pitdicker force-pushed the float_doc branch 2 times, most recently from b635128 to d53a7d9 Compare March 29, 2018 10:14

vks reviewed Mar 30, 2018

View reviewed changes

Document open interval

22b1f80

pitdicker force-pushed the float_doc branch from d53a7d9 to 22b1f80 Compare March 31, 2018 16:52

dhardy merged commit 3ec525a into rust-random:master Apr 1, 2018

pitdicker deleted the float_doc branch April 1, 2018 17:31

pitdicker pushed a commit that referenced this pull request Apr 4, 2018

Merge pull request #351 from pitdicker/float_doc

f6d1259

Document why we use an open interval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document why we use an open interval #351

Document why we use an open interval #351

pitdicker commented Mar 28, 2018

dhardy left a comment

dhardy Mar 28, 2018

pitdicker Mar 28, 2018

pitdicker Mar 29, 2018

dhardy Mar 29, 2018

pitdicker Mar 29, 2018

vks Mar 30, 2018

pitdicker Mar 30, 2018 •

edited

Loading

dhardy Mar 31, 2018

pitdicker Mar 31, 2018

dhardy Mar 31, 2018

vks commented Mar 30, 2018

pitdicker commented Mar 30, 2018

dhardy commented Mar 31, 2018

pitdicker commented Mar 31, 2018

pitdicker commented Mar 31, 2018

vks commented Mar 31, 2018 via email

dhardy commented Mar 31, 2018

Document why we use an open interval #351

Document why we use an open interval #351

Conversation

pitdicker commented Mar 28, 2018

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vks commented Mar 30, 2018

pitdicker commented Mar 30, 2018

dhardy commented Mar 31, 2018

pitdicker commented Mar 31, 2018

pitdicker commented Mar 31, 2018

vks commented Mar 31, 2018 via email

dhardy commented Mar 31, 2018

pitdicker Mar 30, 2018 •

edited

Loading