-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve stdlib's random float generation #10428
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you describe "Over the previous method, this has the advantage of being able to represent much more of the available range.", then you might want to explain this more properly in the comments.
Is there a way to test this stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really still uniform over the range?
I've made some changes:
The goodness of fit test that I added should now verify that the generated values come from a uniformly distributed RNG. The test currently aggregates 100,000 numbers into 1,000 buckets and checks that the p-value is > 0.05; feel free to adjust the numbers to your liking. I personally experimented with 100,000,000 random values and 10,000 buckets and got p-values between 0.2 and 0.7, depending on the RNG seed, but that took quite a few seconds for me to run.
Yes, it should be. For a given exponent value, the distribution of mantissa bits should be uniform, so the mantissa bits can be directly extracted from the RNG. The exponent value is exponentially distributed, since each successive exponent spans double the range of the previous one. In the [0, 1) interval for the |
CI failure is due to timeout, but the relevant tests pass. |
lib/std/rand.zig
Outdated
@@ -15,6 +15,8 @@ const math = std.math; | |||
const ziggurat = @import("rand/ziggurat.zig"); | |||
const maxInt = std.math.maxInt; | |||
|
|||
const Dilbert = @import("rand/Dilbert.zig"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you name the file with _test.zig
in the name then it won't get installed as part of the standard library.
if (rand_lz == 41) { | ||
rand_lz += @clz(u64, r.int(u64)); | ||
if (rand_lz == 41 + 64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this branching may have a significant speed penalty; has this been benchmarked at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested std.rand.Random.float(f32)
, old version vs new version without the branch vs. new version with the branch. Units are in nanoseconds per call in the below results:
Non-inlined Random.float(f32):
old: mean = 11.0096, std_dev = 0.2327
new, no branching: mean = 13.0287, std_dev = 0.1954 (18.3% slower)
new, with branching: mean = 13.8511, std_dev = 0.2545 (25.8% slower)
Inlined Random.float(f32):
old: mean = 1.7341, std_dev = 0.0437
new, no branching: mean = 1.8877, std_dev = 0.0081 (8.8% slower)
new, with branching: mean = 2.3880, std_dev = 0.0118 (37.7% slower)
I experimented with changing the comparison so the CPU could compute it in parallel with clz
and shorten the dependency chain:
const rand = r.int(u64);
var rand_lz = @clz(u64, rand | 0x7FFFFF);
- if (rand_lz == 41) {
+ if ((rand | 0x7FFFFF) == 0x7FFFFF) {
This made the inline case faster, but the non-inline case slower:
Non-inlined Random.float(f32):
new, with updated branching: mean = 14.7841, std_dev = 0.0527 (34.2% slower)
Inlined Random.float(f32):
new, with updated branching: mean = 2.3205, std_dev = 0.0093 (33.8% slower)
I might dig into this a little further with perf
.
if (rand_lz == 41) { | ||
rand_lz += @clz(u64, r.int(u64)); | ||
if (rand_lz == 41 + 64) { | ||
// It is astronomically unlikely to reach this point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then wouldn't it be good to use @setCold
or something here? It probably doesn't work here though because this is not in a function. Maybe if you extract this part into an inline
function and use @setCold(true)
there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't know about those. Yes, those seem perfect for this situation.
// It is astronomically unlikely to reach this point. | |
// TODO: when #5177 or #489 is implemented | |
// tell the compiler it is astronomically unlikely to reach this point. |
So something I thought of today is that this opens up a timing attack that wasn't there previously: certain numbers will take different amounts of time to generate. |
I’m not sure what use case requires cryptographically secure generation of floating point values, but fair point. We can address the But the |
Don't worry about it. Floating point representation is never used in cryptography. The only exception I can think of abuse floating point registers as additional registers to store 53-bit integers. They don't use an actual floating point representation. |
7dba763
to
7bedeb9
Compare
Thank you for the enhancement! |
Status quo is that to generate a random value in the half open interval [0, 1), a random value in [1, 2) is generated via bit fiddling, and then 1 is subtracted.
The problem is that, in IEEE-754, the number of distinct values in [1, 2) is significantly smaller than that of [0, 1). In the single-precision case, only 1/127 of the available range is covered. In the double-precision case, the ratio is even less, only 1/1023.
This commit fixes these shortcomings by directly generating the mantissa and exponent. The mantissa bits are extracted directly from the RNG, and
@clz
is used to apply an exponential bias for the exponent bits. With the new implementation, the available range significantly increases.