Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document why we use an open interval #351

Merged
merged 1 commit into from
Apr 1, 2018

Conversation

pitdicker
Copy link
Contributor

The documentation on the Uniform implementation on floats was very hard to find in rustdoc, so i moved it to the documentation of Uniform itself.

Feel free to shoot my wording down 😄. But I hope this brings the intention across.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me.

/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No
/// (or almost no) sensible code relies on an exact floating-point value to be
/// generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for
/// `f64`). What is relied on, is on the distribution to be uniform, and have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is relied on is having a uniform distribution and a mean of 0.5.

Why median??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was shuffling the names... But you are right, median is not the one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed b.t.w., but the line is just below your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically correct but poor grammar, hence my suggestion, but never mind, this'll do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A, now I get your comment!
This grammar is perfectly fine in Dutch though...

@pitdicker pitdicker force-pushed the float_doc branch 2 times, most recently from b635128 to d53a7d9 Compare March 29, 2018 10:14
///
/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No
/// (or almost no) sensible code relies on an exact floating-point value to be
/// generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/2^23 is not really tiny, it corresponds to just 32 MiB of random data. I could be relevant to generate zero for fuzzing purposes.

Copy link
Contributor Author

@pitdicker pitdicker Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe better to rewords this a bit. Something like "1 in ~8 million (2^23) for f32". Not sure if that counts as tiny, I had mostly f64 in mind when writing that 😄.

It could be relevant to generate zero for fuzzing purposes.

Good point. Of course there are always some valid uses for [0, 1). In this case, fuzzing, we concluded in dhardy#83 that you are better of using some specific library and trait like Arbitrary instead of the old Rand trait or our distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That point was more about converting to user types.

The chance of never getting 0 from 1 sample of n=2^23 choices is (n-1)/n or approx. 0.9999998807907104.

The chance of never getting 0 from k samples of n choices is ((n-1)/n)^k; if we take k=2^23 samples that translates to a probability of 37% of never sampling 0, so you might want to up your sample size a bit:

>>> ((2**23 - 1.0) / (2**23))
0.9999998807907104
>>> ((2**23 - 1.0) / (2**23))**(2**23)
0.3678794192441178
>>> ((2**23 - 1.0) / (2**23))**(2**24)
0.13533526710338942
>>> ((2**23 - 1.0) / (2**23))**(2**25)
0.018315634521945755

This is assuming the f64 ops in Python are able to represent these computations with sufficient accuracy; I think they should be able to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhardy I am not really sure what you mean with the comment above.

The chance to get zero with one try is 2^-23. But 2^23 tries does not guarantee you get zero, there is only a 37% chance it contains 0.0. A nice thing to realize 😄. Not something I should write somehow, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was aimed at @vks (i.e. you shouldn't just take 32MiB and expect to get 0 — not unless you use a non-random counting generator or something, but then it's not fuzz testing any more, it's just complete testing, assuming you have a single f32 input parameter).

@vks
Copy link
Collaborator

vks commented Mar 30, 2018

A counter argument to open intervals is that it is trivial to get the open interval from a half-open one via rejection sampling.

@pitdicker
Copy link
Contributor Author

A counter argument to open intervals is that it is trivial to get the open interval from a half-open one via rejection sampling.

Even faster: it is possible to convert between (0, 1) and [0, 1) by subtracting ε/2. Or to (0, 1] by adding ε/2. Maybe that is something worth mentioning in the documentation...

@dhardy
Copy link
Member

dhardy commented Mar 31, 2018

Maybe that is something worth mentioning in the documentation...

Yes and no. These may be useful, but if users start doing rng.gen() - ε/2 (or + ε/2) in many places, it means we can never change what the Uniform distr. does here. So there is some value in alternate distributions (OpenClosed01 and ClosedOpen01).

@pitdicker
Copy link
Contributor Author

So there is some value in alternate distributions (OpenClosed01 and ClosedOpen01).

Yes, I planned to write somewhere I am really not against adding something like ClosedOpen01. I do think an open distribution as we have now is the best choice for the default. But we were having the problem of an exploding number of choices, which is why I didn't yet like to see additions just for completeness.

@pitdicker
Copy link
Contributor Author

I have replaced "tiny chance" with "very small" and added "~8 million", but am a bit done with tweaking this text to be honest...

@vks
Copy link
Collaborator

vks commented Mar 31, 2018 via email

@dhardy
Copy link
Member

dhardy commented Mar 31, 2018

@vks an expected value of 1 does not imply high probability and this is a crazy argument for making gen() output 0 IMO.

@dhardy dhardy merged commit 3ec525a into rust-random:master Apr 1, 2018
@pitdicker pitdicker deleted the float_doc branch April 1, 2018 17:31
pitdicker pushed a commit that referenced this pull request Apr 4, 2018
Document why we use an open interval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants