Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stdlib's random float generation #10428

Merged
merged 2 commits into from
May 11, 2022

Conversation

mrakh
Copy link
Contributor

@mrakh mrakh commented Dec 28, 2021

Status quo is that to generate a random value in the half open interval [0, 1), a random value in [1, 2) is generated via bit fiddling, and then 1 is subtracted.

The problem is that, in IEEE-754, the number of distinct values in [1, 2) is significantly smaller than that of [0, 1). In the single-precision case, only 1/127 of the available range is covered. In the double-precision case, the ratio is even less, only 1/1023.

This commit fixes these shortcomings by directly generating the mantissa and exponent. The mantissa bits are extracted directly from the RNG, and @clz is used to apply an exponential bias for the exponent bits. With the new implementation, the available range significantly increases.

Copy link
Contributor

@matu3ba matu3ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you describe "Over the previous method, this has the advantage of being able to represent much more of the available range.", then you might want to explain this more properly in the comments.

Is there a way to test this stuff?

lib/std/rand.zig Outdated Show resolved Hide resolved
lib/std/rand.zig Outdated Show resolved Hide resolved
Copy link
Contributor

@daurnimator daurnimator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really still uniform over the range?

lib/std/rand.zig Outdated Show resolved Hide resolved
@daurnimator daurnimator added the standard library This issue involves writing Zig code for the standard library. label Dec 28, 2021
@mrakh
Copy link
Contributor Author

mrakh commented Dec 28, 2021

I've made some changes:

  • Added some logic so that the randgen should now be able to generate every possible IEEE-754 value in the [0, 1) range, even the subnormals
  • Added cold path test for random float function
  • Added chi-square goodness of fit test for random float function

If you describe "Over the previous method, this has the advantage of being able to represent much more of the available range.", then you might want to explain this more properly in the comments.

Is there a way to test this stuff?

The goodness of fit test that I added should now verify that the generated values come from a uniformly distributed RNG. The test currently aggregates 100,000 numbers into 1,000 buckets and checks that the p-value is > 0.05; feel free to adjust the numbers to your liking. I personally experimented with 100,000,000 random values and 10,000 buckets and got p-values between 0.2 and 0.7, depending on the RNG seed, but that took quite a few seconds for me to run.

Is this really still uniform over the range?

Yes, it should be. For a given exponent value, the distribution of mantissa bits should be uniform, so the mantissa bits can be directly extracted from the RNG. The exponent value is exponentially distributed, since each successive exponent spans double the range of the previous one. In the [0, 1) interval for the f32 case, the exponent must be an integer in the interval [0, 127), with successive values being twice as likely. And if you compute the clz on an N-bit uniform RNG, you get an exponentially distributed RNG in the interval [0, N] where each successive number is half as likely. So if you compute the clz of a 126-bit number, then subtract it from 126, you should get exponent bits for a uniformly randomly chosen single-precision IEEE-754 number within [0, 1).

@mrakh
Copy link
Contributor Author

mrakh commented Dec 29, 2021

CI failure is due to timeout, but the relevant tests pass.

lib/std/rand.zig Outdated
@@ -15,6 +15,8 @@ const math = std.math;
const ziggurat = @import("rand/ziggurat.zig");
const maxInt = std.math.maxInt;

const Dilbert = @import("rand/Dilbert.zig");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you name the file with _test.zig in the name then it won't get installed as part of the standard library.

if (rand_lz == 41) {
rand_lz += @clz(u64, r.int(u64));
if (rand_lz == 41 + 64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this branching may have a significant speed penalty; has this been benchmarked at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested std.rand.Random.float(f32), old version vs new version without the branch vs. new version with the branch. Units are in nanoseconds per call in the below results:

Non-inlined Random.float(f32):
old: mean = 11.0096, std_dev = 0.2327
new, no branching: mean = 13.0287, std_dev = 0.1954 (18.3% slower)
new, with branching: mean = 13.8511, std_dev = 0.2545 (25.8% slower)

Inlined Random.float(f32):
old: mean = 1.7341, std_dev = 0.0437
new, no branching: mean = 1.8877, std_dev = 0.0081 (8.8% slower)
new, with branching: mean = 2.3880, std_dev = 0.0118 (37.7% slower)

I experimented with changing the comparison so the CPU could compute it in parallel with clz and shorten the dependency chain:

  const rand = r.int(u64);
  var rand_lz = @clz(u64, rand | 0x7FFFFF);
- if (rand_lz == 41) {
+ if ((rand | 0x7FFFFF) == 0x7FFFFF) {

This made the inline case faster, but the non-inline case slower:

Non-inlined Random.float(f32): 
new, with updated branching: mean = 14.7841, std_dev = 0.0527 (34.2% slower)

Inlined Random.float(f32): 
new, with updated branching: mean = 2.3205, std_dev = 0.0093 (33.8% slower)

I might dig into this a little further with perf.

@daurnimator daurnimator dismissed their stale review December 30, 2021 02:49

have re-reviewed.

if (rand_lz == 41) {
rand_lz += @clz(u64, r.int(u64));
if (rand_lz == 41 + 64) {
// It is astronomically unlikely to reach this point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then wouldn't it be good to use @setCold or something here? It probably doesn't work here though because this is not in a function. Maybe if you extract this part into an inline function and use @setCold(true) there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setCold is function specific and I don't know how well cold attributes work on inline functions, better to just add a comment waiting for #5177 or #489 to be implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know about those. Yes, those seem perfect for this situation.

Suggested change
// It is astronomically unlikely to reach this point.
// TODO: when #5177 or #489 is implemented
// tell the compiler it is astronomically unlikely to reach this point.

@daurnimator
Copy link
Contributor

So something I thought of today is that this opens up a timing attack that wasn't there previously: certain numbers will take different amounts of time to generate.
@jedisct1 got thoughts on this?

@mrakh
Copy link
Contributor Author

mrakh commented Jan 1, 2022

So something I thought of today is that this opens up a timing attack that wasn't there previously: certain numbers will take different amounts of time to generate.
@jedisct1 got thoughts on this?

I’m not sure what use case requires cryptographically secure generation of floating point values, but fair point. We can address the f32 case by always requesting the worst-case amount of necessary entropy (126 + 23 = 149 bits).

But the f64 worst-case entropy is much higher, at 1022 + 52 = 1074 bits. Requesting that many random bits for every f64 we wish to generate would have a quite negative impact on performance. On the other hand, generating n random bits would mean being unable to generate any value in the [0, 2^-(n+1)) interval. Perhaps requesting a fixed 192 random bits per f64 is a good compromise - 52 random bits for the mantissa, and 140 for the exponent? This puts the minimum generatable f64 value to 2^-141, at the cost of 3 invocations of std.rand.Random.float(u64).

@jedisct1
Copy link
Contributor

jedisct1 commented Jan 1, 2022

So something I thought of today is that this opens up a timing attack that wasn't there previously: certain numbers will take different amounts of time to generate.
@jedisct1 got thoughts on this?

Don't worry about it.

Floating point representation is never used in cryptography.

The only exception I can think of abuse floating point registers as additional registers to store 53-bit integers. They don't use an actual floating point representation.

@andrewrk andrewrk force-pushed the rand_float_improvement branch from 7dba763 to 7bedeb9 Compare May 11, 2022 02:03
@andrewrk andrewrk merged commit ed63d6c into ziglang:master May 11, 2022
@andrewrk
Copy link
Member

Thank you for the enhancement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants