-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document why we use an open interval #351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good to me.
src/distributions/mod.rs
Outdated
/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No | ||
/// (or almost no) sensible code relies on an exact floating-point value to be | ||
/// generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for | ||
/// `f64`). What is relied on, is on the distribution to be uniform, and have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is relied on is having a uniform distribution and a mean of 0.5.
Why median??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was shuffling the names... But you are right, median is not the one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed b.t.w., but the line is just below your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically correct but poor grammar, hence my suggestion, but never mind, this'll do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A, now I get your comment!
This grammar is perfectly fine in Dutch though...
b635128
to
d53a7d9
Compare
src/distributions/mod.rs
Outdated
/// | ||
/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No | ||
/// (or almost no) sensible code relies on an exact floating-point value to be | ||
/// generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1/2^23 is not really tiny, it corresponds to just 32 MiB of random data. I could be relevant to generate zero for fuzzing purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe better to rewords this a bit. Something like "1 in ~8 million (2^23) for f32
". Not sure if that counts as tiny, I had mostly f64
in mind when writing that 😄.
It could be relevant to generate zero for fuzzing purposes.
Good point. Of course there are always some valid uses for [0, 1)
. In this case, fuzzing, we concluded in dhardy#83 that you are better of using some specific library and trait like Arbitrary
instead of the old Rand
trait or our distributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That point was more about converting to user types.
The chance of never getting 0 from 1 sample of n=2^23 choices is (n-1)/n or approx. 0.9999998807907104.
The chance of never getting 0 from k samples of n choices is ((n-1)/n)^k; if we take k=2^23 samples that translates to a probability of 37% of never sampling 0, so you might want to up your sample size a bit:
>>> ((2**23 - 1.0) / (2**23))
0.9999998807907104
>>> ((2**23 - 1.0) / (2**23))**(2**23)
0.3678794192441178
>>> ((2**23 - 1.0) / (2**23))**(2**24)
0.13533526710338942
>>> ((2**23 - 1.0) / (2**23))**(2**25)
0.018315634521945755
This is assuming the f64 ops in Python are able to represent these computations with sufficient accuracy; I think they should be able to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhardy I am not really sure what you mean with the comment above.
The chance to get zero with one try is 2^-23. But 2^23 tries does not guarantee you get zero, there is only a 37% chance it contains 0.0. A nice thing to realize 😄. Not something I should write somehow, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was aimed at @vks (i.e. you shouldn't just take 32MiB and expect to get 0 — not unless you use a non-random counting generator or something, but then it's not fuzz testing any more, it's just complete testing, assuming you have a single f32
input parameter).
A counter argument to open intervals is that it is trivial to get the open interval from a half-open one via rejection sampling. |
Even faster: it is possible to convert between (0, 1) and [0, 1) by subtracting ε/2. Or to (0, 1] by adding ε/2. Maybe that is something worth mentioning in the documentation... |
Yes and no. These may be useful, but if users start doing |
Yes, I planned to write somewhere I am really not against adding something like |
I have replaced "tiny chance" with "very small" and added "~8 million", but am a bit done with tweaking this text to be honest... |
@dhardy
I just calculated the number of samples needed to get an expectation value
of 1 for the number of zeros. They point was that you get a high
probability to generate a zero for plausible sample sizes.
…On Sat, Mar 31, 2018, 18:23 Diggory Hardy ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/distributions/mod.rs
<#351 (comment)>
:
> /// ```
///
+/// # Open interval for floats
+/// In theory it is possible to choose between an open interval `(0, 1)`, and
+/// the half-open intervals `[0, 1)` and `(0, 1]`. All can give a distribution
+/// with perfectly uniform intervals. Many libraries in other programming
+/// languages default to the closed-open interval `[0, 1)`. We choose here to go
+/// with *open*, with the arguments:
+///
+/// - The chance to generate a specific value, like exactly 0.0, is *tiny*. No
+/// (or almost no) sensible code relies on an exact floating-point value to be
+/// generated with a tiny chance (1 in 2^23 for `f32`, and 1 in 2^52 for
No, it was aimed at @vks <https://github.com/vks> (i.e. you shouldn't
just take 32MiB and expect to get 0 — not unless you use a non-random
counting generator or something, but then it's not fuzz testing any more,
it's just complete testing, assuming you have a single f32 input
parameter).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#351 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtLqx1C2UdWJRf_ovpybo74GYhOL9ks5tj61ugaJpZM4S-e8m>
.
|
@vks an expected value of 1 does not imply high probability and this is a crazy argument for making |
Document why we use an open interval
The documentation on the
Uniform
implementation on floats was very hard to find in rustdoc, so i moved it to the documentation ofUniform
itself.Feel free to shoot my wording down 😄. But I hope this brings the intention across.