Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniformInt::sample_single leading zeros approximation is biased #661

Closed
TheIronBorn opened this issue Dec 9, 2018 · 4 comments · Fixed by #662
Closed

UniformInt::sample_single leading zeros approximation is biased #661

TheIronBorn opened this issue Dec 9, 2018 · 4 comments · Fixed by #662
Labels
P-high Priority: high X-bug Type: bug report

Comments

@TheIronBorn
Copy link
Collaborator

For every possible input, each possible output should be equally likely:

const RANGE: u8 = 3;
let zone = RANGE << RANGE.leading_zeros();
let mut map = [0; RANGE as usize];
for r in 0..=u8::max_value() {
    let (h, l) = r.wmul(RANGE);
    if l <= zone { map[h as usize] += 1; }
}
// must be uniform for every value, produces [65, 64, 64]
assert_eq!(map, [map[0]; RANGE as usize]);

(playground link for verification: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=485be3f36a790b767a1d7366dd1de364)

I think all that's required is changing the comparison to l < zone

@TheIronBorn
Copy link
Collaborator Author

@dhardy
Copy link
Member

dhardy commented Dec 10, 2018

Well spotted. We could do with tests for this kind of thing but unfortunately I don't think it's practical.

I think the error is here:

                let zone =
                    if ::core::$unsigned::MAX <= ::core::u16::MAX as $unsigned {
                        // Using a modulus is faster than the approximation for
                        // i8 and i16. I suppose we trade the cost of one
                        // modulus for near-perfect branch prediction.
                        let unsigned_max: $u_large = ::core::$u_large::MAX;
                        let ints_to_reject = (unsigned_max - range + 1) % range;
                        unsigned_max - ints_to_reject
                    } else {
                        // conservative but fast approximation
                       range << range.leading_zeros()
                    };

Here ints_to_reject is the number of integers to reject from the range 0..(2.pow(BIT_SIZE)); i.e. the first case would be exclusive from the range 0..(MAX - ints_to_reject + 1), though since ints_to_reject may be zero we can't write that (unless we allow wrapping and avoid rejection for zone == 0).

Instead we consider the upper-bound inclusive which avoids that problem, however this isn't compatible with the second case: there we should subtract 1 if the upper-bound is inclusive. However, what if range == 0? We want MAX which is 0 - 1 with wrapping arithmetic, so this works if we use:

(range << range.leading_zeros()).wrapping_sub(1)

I think this is the best way of doing things since it avoids having to trap for special cases. Alternatively we could drop the range << range.leading_zeros() optimisation and always use modulus; we should benchmark before doing that.

It looks like this only affects UniformInt::sample_single.

Want to make a PR?

@dhardy dhardy added X-bug Type: bug report T-distributions P-high Priority: high labels Dec 10, 2018
@TheIronBorn
Copy link
Collaborator Author

Note that the bitmasking method has the same rejection probabilities as our leading zeros method:
https://docs.google.com/spreadsheets/d/1qjBfi4w0IvIdz2MECLyC0w4H1mMPqI8K5r7YmyZJ0XU/edit?usp=sharing
Bitmasking doesn't use multiplication so it's probably worth benchmarking as well

TheIronBorn added a commit to TheIronBorn/rand that referenced this issue Dec 11, 2018
@TheIronBorn
Copy link
Collaborator Author

We could do with tests for this kind of thing but unfortunately I don't think it's practical.

We could just do like I have and have tests for smaller integers. The logic is unlikely to change for larger integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-high Priority: high X-bug Type: bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants