Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting a number in [0, 1) out of the PRNG #20

Open
bakkot opened this issue Dec 6, 2022 · 2 comments
Open

Getting a number in [0, 1) out of the PRNG #20

bakkot opened this issue Dec 6, 2022 · 2 comments

Comments

@bakkot
Copy link

bakkot commented Dec 6, 2022

PRNGs generally create 64-bit integers. To get a value in [0, 1), you have to use your integer to get a float in the appropriate range.

This is a somewhat subtle problem. I think the best approach is to shift right by 11, cast that to a float, and then multiply by 1/2^53. Details below.


Choosing a float in the appropriate range among all such floats won't get you a value chosen uniformly - the density of floats increases as you get closer to 0.

Dividing by 2^64 is slightly biased because you have to convert to a floating point number first, which loses precision for numbers above 2^53 (so for example, because 2**53 === 2**53 + 1, the number 1/2^53 is output twice as often as the number 1/2). Also you have to deal with the problem that there's about a thousand possible values (e.g. 2^64 - 1) for which that algorithm produces 1, which is not supposed to be a valid output.

Restricting to 2^53 and then dividing by 2^53 gives a uniform distribution in the interval, but the final bit is non-uniform because of round-to-even, and also only produces 2^53 different values despite there being almost 2^61 floats in [0, 1), i.e. it produces only 0.2% of the floats in that range.

At the bottom of this page (in the random_real function) Sebastiano Vigna gives code which can generate float which is uniform in [0, 1) and also doesn't skip any values (he mentions here that he didn't invent it but I'm not sure who did). But it is much more complex and slower.


This paper has a review of some of the pitfalls, though it's probably pretty opaque to most readers. But the Implementations in Software section is pretty accessible. Upshot is that C++ and Go take the "divide by 2^63" approach (but handle the edge case differently); Java, Rust (presumably meaning rand), Python, and Numpy all take the "divide by 2^53" approach, and other languages do other things.

Vigna, who is also the developer of the widely-used Xoshiro family, recommends the "divide by 2^53" approach at the end of this page.


Given these tradeoffs, the "divide by 2^53" approach seems best, even though this means it will actually never emit many of the floats in [0, 1) - that's probably not a problem in practice.

There's two remaining open questions: which 53 bits of your 64-bit input do you use (most significant or least significant), and do you actually divide by 2^53 or instead multiply by 1/2^53 (which is a different operation when dealing with floats).

Taking the upper 53 seems to be the common approach, this comment says Rust's rand crate did that because "for simple RNGs those are usually more random". That's also what's suggested (without reason given) in Vigna's page linked above.

Dividing by 2^53 vs multiplying by 1/2^53 isn't really a choice - multiplication is much faster than division (you precompute 1/2^53, of course). I only mention it for completeness, because it is a detail that needs to be specified.


This means we'll have exactly the same algorithm as Rust's rand and numpy. There are languages which do different things, per the paper above (C++ and Go, for example), but these choices seem solid from what I can tell.

Also if you made me pick "do the thing Rust and Numpy do, or do the thing C++ and Go do" for a numeric thing with zero context I would take the Rust/Numpy option.

@michaelficarra
Copy link
Member

I think you meant to link to https://github.com/rust-random/rand/blob/7aa25d577e2df84a5156f824077bb7f6bdf28d97/src/distributions/float.rs#L111-L117 for your Rust links. Your links point to the (0, 1] variant.

@bakkot
Copy link
Author

bakkot commented Dec 6, 2022

Ah, yup, so I did. Had the right code up locally but scanned to the wrong part when pulling it up on Github. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants