Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HighPrecision<f32/64> #531

Closed
wants to merge 4 commits into from
Closed

HighPrecision<f32/64> #531

wants to merge 4 commits into from

Conversation

sicking
Copy link
Contributor

@sicking sicking commented Jun 26, 2018

This implements a distribution like Uniform<f32/f64>, but using the full precision of the floating point types.

There's some unfortunate things in there, like the HPFloatHelper trait. However it seems needed for the same reason that we have that we have the UniformSampler and SampleUniform traits. I went with a slightly simplified version of those traits since we don't have the same need of extensibility here.

I wrote some pretty thorough tests in order to convince myself that the implementation is correct. The full set of tests take well over 60 seconds to run on my modern MacBook Pro, so I instead have a smaller set of tests which are run by default. But the full set is also there and can be enabled by anyone tinkering with this code in the future.

Regarding relationship to issue #494, I think this PR is actually relatively orthogonal. I don't see the distribution in this PR as something that someone would ever want to use as input for other randomness related algorithms, such as weighted choices or bool generation. It's simply too bad performance and too high precision.

In fact, we might want to name this FullPrecision<...> or MaxPrecision<...> instead. And leave the name HighPrecision for something with better performance.

@sicking
Copy link
Contributor Author

sicking commented Jun 27, 2018

Oh, and I intentionally didn't include the code for HighPrecision01 from #372 since I don't want to take credit for @pitdicker's work.

@dhardy
Copy link
Member

dhardy commented Jun 27, 2018

Your benchmarks show 6-10× the runtime:

test distr_uniform_f32           ... bench:       1,568 ns/iter (+/- 30) = 2551 MB/s
test distr_highprecision1_f32    ... bench:       9,184 ns/iter (+/- 639) = 435 MB/s
test distr_highprecision2_f32    ... bench:      15,198 ns/iter (+/- 707) = 263 MB/s
test distr_highprecision3_f32    ... bench:      15,792 ns/iter (+/- 690) = 253 MB/s

test distr_uniform_f64           ... bench:       2,153 ns/iter (+/- 42) = 3715 MB/s
test distr_highprecision1_f64    ... bench:      11,909 ns/iter (+/- 437) = 671 MB/s
test distr_highprecision2_f64    ... bench:      18,425 ns/iter (+/- 4,472) = 434 MB/s
test distr_highprecision3_f64    ... bench:      19,341 ns/iter (+/- 2,335) = 413 MB/s

Which is reasonable enough if the precision is actually needed, though as you say there may not be many use-cases where it is.

So the big question is do we want to include this in Rand? You say it is orthogonal to #494 but it's not exactly, for example we could put all high-precision implementations in a companion crate.


since I don't want to take credit for ...

The git commits still track who created the commit (even if you rebase or adjust it), so not a problem.

@sicking
Copy link
Contributor Author

sicking commented Jun 27, 2018

Yeah, sticking this in a separate crate is definitely an option. Also related to discussion in #290. I'll see if I can rebase the other patch.

@sicking
Copy link
Contributor Author

sicking commented Jun 29, 2018

I've rebased this on top of #523 and added the code from #372. So the PR will look funny until #523 lands.

@sicking sicking force-pushed the highprec branch 5 times, most recently from 1c96ce5 to fa1f932 Compare July 1, 2018 20:02
@dhardy dhardy added the P-postpone Waiting on something else label Jul 7, 2018
@pitdicker
Copy link
Contributor

Sorry for not looking at your code for so long. Can you maybe give a high-level description how HighPrecision<f32/f64> works before I dive in?

Oh, and I intentionally didn't include the code for HighPrecision01 from #372 since I don't want to take credit for @pitdicker's work.

Feel free to treat the code like it is your own 😄.

@sicking
Copy link
Contributor Author

sicking commented Jul 21, 2018

No worries. Mainly did this because it was a fun challenge. I'm not particularly opinionated about where the final implementation ends up living since I'm not sure in which situations this code is needed.

It's easiest to think about only positive numbers as a start, and using 32bit numbers as an example.

The implementation takes high and low and converts into binary representations of the exponent and the mantissa. This mainly means parsing out the mantissa/exponent, and then adding the 24th high bit that IEEE754 optimizes out. If we have an underflow number this bit is not added but the exponent is adjusted instead. Note that we here don't care about the exponent bias, but rather just treat the exponent as a number between 1-254. (Biases only matters converting encoded values to numeric values, which we never do). Lets say that we get:
low_mantissa: 0x0080_1234
low_exponent: 140
high_mantissa: 0x00a0_0012
high_exponent: 150

Unless we had underflow, both mantissas will be 24-bit numbers.

We then convert both numbers to use the higher exponent, and shift down the mantissa to compensate. Here that means right-shifting low_mantissa, and increasing low_exponent, by 10. During shifting, if we're shifting the low value, round towards negative infinity. If we're shifting the high value, round towards positive infinity. This gives

low_mantissa: 0x0000_2004
low_exponent: 150
high_mantissa: 0x00a0_0012
high_exponent: 150

(At this point this information is saved in the Distribution object).

Conceptually what we do after this is that we treat the high and the low as infinitely long numbers, by adding 0s at the end, and generate an infinitely long random number between them. We then grab enough bits of this random number, rounding towards negative infinity, that we get a 24bit number. In our case:

Generate a number between
0x0000_2002_0000_0000_...
and
0x00a0_0012_0000_0000_...
like
0x000b_85f2_2fe1_c90a_...
grab 24 bits (while also adjusting the exponent)
0x00b8_5f22
exponent: 150 - 4 = 146

In practice, the way this is done is to generate a number between the low and high mantissa using an Uniform<u32>, in our example we get 0x000b_85f2. We then shift left 4 bits to get a 24bit number, and add random 4bit value (the number 2 in our example). We also decrease the exponent by 4 giving us 146.

This is repeated until we have a mantissa that is exactly 24 bits, or until we get into the underflow range (i.e. when exponent is 1). Usually only one loop is needed, except if we started with a mantissa of exactly 0.

We could at this point shift left more, and add more random bits at the end. However if we do that and then shift right to get to a 24bit value we'd get back exactly to where we are now. So at this point we know that even if we had added infinitely more random bits at the end, rounding would get us back to exactly the mantissa that we have now.

Finally we check that our number:
mantissa: 0x00b8_5f22
expnent: 146

is within the range that we set out to generate a number in. The only reason that we could get a number outside of the range is the rounding we did in the beginning to make both our limits have the same exponent.

If we ended up with a number that's too big or too small, then start over from scratch.

That's about all that's needed.

Other than taking care of negative numbers.

Negative numbers are handled by giving encoding the two mantissas as normal i32s. I.e. if the lower limit is negative, we extract the mantissa and exponent as described above, but then multiply the mantissa by -1.

These signed integers are then used as limits in the uniform distribution. I.e. the distribution is a Uniform<i32>.

Once have a generated number, we check if it is negative and then grab its absolute value. This absolute value is what all subsequent math is done on. This in order to correctly measure that we have a 24bit value since this matches the IEEE754 encoding. Everything is done exactly the same other than two things:

  • When we left-shift and add a random number in order to get a 24bit value, we subtract rather than add the extra bits at the end (in our example above, we would have subtracted 2 rather than add 2).
  • At the end, if we had a negative number, we set the sign-bit in the generated result.

I've skipped some details above, but this is mainly it. Do let me know if anything is still unclear.

@sicking
Copy link
Contributor Author

sicking commented Jul 21, 2018

Oh, and we might want to move these distributions to their own file. There's very little sharing with the rest of the code in float.rs, especially now that the existing macro invocations can't be used on account of SIMD, and it'd make it easier to move to a separate crate if desired.

@dhardy
Copy link
Member

dhardy commented Mar 30, 2023

As mentioned in #494, I will now close this. It's a nice idea, but doesn't seem useful in rand itself and I haven't seen much interest in actually using it. If it were to be revived, there's still the question of where it should be housed (rand_distr or a new crate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-postpone Waiting on something else
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants