Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen_range long-running loop #951

Closed
jongiddy opened this issue Mar 16, 2020 · 12 comments
Closed

gen_range long-running loop #951

jongiddy opened this issue Mar 16, 2020 · 12 comments

Comments

@jongiddy
Copy link
Contributor

The following code to create a predictable RNG times-out, after repeatedly re-fetching values.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d7da5d2567c957af40788c1af4902b08

Not sure if this has to do with rejection of out-of-range values but for a range of 3 values, I would expect that there is at most 2 values that need to be rejected, so any RNG that returns different values should work reasonably quickly.

@dhardy
Copy link
Member

dhardy commented Mar 16, 2020

self.0 = self.0.wrapping_add(u32::max_value() / 3 * 2);

You call this an RNG? From your output:

[src/main.rs:9] self.0 = 2863311530
[src/main.rs:9] self.0 = 1431655764
[src/main.rs:9] self.0 = 4294967294
[src/main.rs:9] self.0 = 2863311528
[src/main.rs:9] self.0 = 1431655762
[src/main.rs:9] self.0 = 4294967292
[src/main.rs:9] self.0 = 2863311526
[src/main.rs:9] self.0 = 1431655760

Our sampling algorithms assume that RNGs are reasonably random, and this particular one uses rejection sampling.

In case you want to know why this happens, take a look at the implementation:

                    let v: $u_large = rng.gen();
                    let (hi, lo) = v.wmul(range);
                    if lo <= zone {
                        return low.wrapping_add(hi as $ty);
                    }

Yep, we're now multiplying by 3 modulo u32::MAX. Your algorithm adds approximately 2/3 modulo u32::MAX. Thus the lo part here barely changes.

@dhardy dhardy closed this as completed Mar 16, 2020
@jongiddy
Copy link
Contributor Author

I created this specific RNG for testing with deterministic values. However, it appears I triggered a case that converges very slowly. But even with a true RNG, it seems the sampling algorithm could take an indefinite amount of time to return. Where does this algorithm come from?

@vks
Copy link
Collaborator

vks commented Mar 17, 2020

But even with a true RNG, it seems the sampling algorithm could take an indefinite amount of time to return.

It could take a large amount of time, but the probability for that is extremely small.

Where does this algorithm come from?

https://en.wikipedia.org/wiki/Rejection_sampling

@dhardy
Copy link
Member

dhardy commented Mar 17, 2020

But your algorithm was created to test what? If distributions can still provide unbiased results given an extremely biased RNG? Not going to work. That distributions can still return results? What does this prove?

@jongiddy
Copy link
Contributor Author

@dhardy Sorry, I think there's some crossed wires here. I didn't create this RNG to test the rand library or how it handles bias.

I created it for a test of my own code. I wanted it to be deterministic to allow me to check the results in my code tests. Initially, the test RNG just returned 1, but that translated to always selecting low, so multiplying by 1/3 of the word size was an attempt to get variety while still being deterministic.

However, that made the test hang and then timeout. If I waited long enough without a timeout, I expect it would eventually reach a value in the zone. Thanks to your comments, I now have a test RNG that finishes quickly.

My residual concern is that it would be valid for a truly random RNG to return values out of the zone for many iterations, so this could be a source of intermittent and irreproducible delays in rand-using code. It's apparent that a trade-off has been made here, favoring speed for the best (and common) case over the rare case that has a significant delay.

It does raise the question of whether this optimization is actually faster on average or just in the best case. My initial off-the-cuff calculation is that for my code example there should only be 4 values returned by gen() in the reject list. Instead, the approximation appears to make that list much larger.

Maybe gen_range should have a comment in the docs along the lines of "The rejection sampling may lead to delays. There are possible but rare sequences that may cause significant delays."

@dhardy
Copy link
Member

dhardy commented Mar 17, 2020

I wanted it to be deterministic to allow me to check the results in my code tests.

This requirement is perfectly fine — not only do we allow seeding most PRNGs with a fixed key, we also document portability and changes across versions. But maybe you mean something other than merely deterministic here?

It was assumed in the design of these algorithms that the PRNG is apparently random. And, as I'm sure you've found by now, it doesn't take very much complexity to avoid issues of this kind. But we're actually have much higher standards for PRNGs we provide, for example the TestU01 and PractRand test batteries.

If you can show that this code has real performance issues with good PRNGs, then we should address that. But if your argument is merely there can be blocking issues with severely biased PRNGs, I don't really see this as an issue, because (a) one must expect that these algorithms may not deliver useful results with such PRNGs, and (b) it is easy to construct a PRNG which exhibits severe blocking behaviour, even if the rejection-zone were optimally small.

@jongiddy
Copy link
Contributor Author

The problem, even when using a good PRNG, is the variability between best-case and worst-case execution time.

If generating a random value takes time T and, for simplicity, if the calculation u32::MAX - ((u32::MAX - range + 1) % range) also takes time T, then using

let unsigned_max: $u_large = ::core::$u_large::MAX;
let ints_to_reject = (unsigned_max - range + 1) % range;
unsigned_max - ints_to_reject
almost always takes time 2T. For a range of 3, it takes longer than 2T only 1 time in 4294967296.

The alternative heuristic at

(range << range.leading_zeros()).wrapping_sub(1)
with $u_large being u32 gives 3 << 30 = 3221225472. Since gen() has 4294967296 possible values, 25% of values will be out of the zone and will cause the value generation to loop. Overall the heuristic takes time 1.33T on average, but with a large variance. 1.5% of gen_range calls will take 4T or longer.

If the range is 4, then 4 << 29 = 2147483648 and 50% of the generated values are rejected, which seems particularly bad when you realise that this case could have zero rejections. This takes time 2T on average, but 12.5% of calls will take 4T or longer.

Whether it is a problem in practice is debatable since, for multiple samples, the docs recommend a different approach. But it perhaps bears more investigation, which I see you've considered previously: #661 (comment)

Have there been any benchmarks created for this code since then? I couldn't see any, but maybe I am missing a call chain.

Would you be happy to look at a PR to benchmark:

  • 10..01 - almost 50% rejection rate, no opportunity to optimize
  • (any power of two) - currently 50% rejection rate, could be 0%
  • 0..011 - currently 25% rejection rate, could be almost 0%

@peteroupc
Copy link

There are multiple algorithms for generating random integers in a range from an RNG, some of which are exact ("unbiased"), and some of which have better performance profiles than others. Here is a survey of them:

  • O'Neill, M.E., "Efficiently Generating a Number in a Range", Jul. 22, 2018.
  • Lumbroso, J. (@jlumbroso), "Optimal Discrete Uniform Generation from Coin Flips, and Applications", arXiv:1304.1916 [cs.DS], 2013.
  • Saad et al., "The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions", 2020 (reference code).

See also Fast Loaded Dice Roller Experiments, which compares several sampling algorithms (but these algorithms are weighted, rather than unweighted which is the focus here).

@dhardy
Copy link
Member

dhardy commented Mar 20, 2020

@jongiddy yes, please go ahead and make a PR with a comparator. Apparently I underestimated how interested you are in this topic!

Regarding benchmarks, it's long enough ago that I can't remember what precisely was used. @pitdicker wrote a lot of the uniform sampling code but has since moved on from this project. I suspect you will need to write some benchmarks yourself.

@peteroupc thanks for the sources. I mostly haven't looked further into these on the basis that "we are doing ok already" (#570).

@vks
Copy link
Collaborator

vks commented May 7, 2021

I believe this was fixed with #985. Please feel free to reopen if it was not.

@vks vks closed this as completed May 7, 2021
@jlumbroso
Copy link

Thanks @peteroupc for the mention! I missed it the first time around.

@jongiddy
Copy link
Contributor Author

jongiddy commented May 8, 2021

#985 added benchmarks to better represent the variety of run times for different range sizes. I then looked at some fixes to improve the times. There was no absolute improvement. My changes tended to rebalance the times, somewhat more evenly and consistently, but with some times getting worse. This is reasonable, since, for example, the full range case is ultra-optimized in the existing code, and improving other cases at a cost to it might be a sensible trade-off.

Since the current code can't be improved without trade-offs, I agree that this issue can be closed. I will look back over my changes and see if there is a combination that I can submit as a PR for consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants