Deprecate Rng::gen_weighted_bool #308

pitdicker · 2018-03-17T17:29:19Z

As discussed in #293.

dhardy · 2018-03-17T18:49:09Z

I'd prefer to have gen_bool aka Bernoulli before deprecating this.

pitdicker · 2018-03-17T20:53:32Z

Implemented gen_bool.

I went the easiest route here.
I copied over the latest iteration of the code to generate a float high precision, but have not adapted it to use more than 32/64 bits per float. Also I have not made a separate Bernoulli distribution (#300), that just feels like 'too much' to me for such a simple function.

Thanks for the suggestion to use my higher precision code here, that gives an excuse to add it 😄. But I'll hold of a little while longer for the range code, we would need to have some discussion first for how to expose it...

dhardy

Some little doc issues but the code looks good.

dhardy · 2018-03-18T10:48:29Z

src/lib.rs

    fn gen_weighted_bool(&mut self, n: u32) -> bool {
        // Short-circuit after `n <= 1` to avoid panic in `gen_range`
        n <= 1 || self.gen_range(0, n) == 0
    }

+    /// Return a bool with a `p` probability of being true.


Swap p and probability

dhardy · 2018-03-18T10:50:18Z

src/distributions/float.rs

+
+        impl Distribution<$ty> for HighPrecision01 {
+            /// Generate a floating point number in the open interval `(0, 1)`
+            /// (not including either endpoint) with a uniform distribution.


I believe this is actually half-open (your old open version used rejection sampling; this code will yield 0 when the sampled u32 is zero). For sample < p this is what we want anyway, so it's just the comment (and possibly the distribution name, I'm not sure on this yet) to change.

dhardy · 2018-03-18T10:54:42Z

src/distributions/float.rs

+            /// Generate a floating point number in the open interval `(0, 1)`
+            /// (not including either endpoint) with a uniform distribution.
+            ///
+            /// This is different from `Uniform` in that it it uses all 32 bits


double 'it'

dhardy · 2018-03-18T10:55:44Z

src/distributions/float.rs

+            /// use rand::distributions::HighPrecision01;
+            ///
+            /// let val: f32 = SmallRng::new().sample(HighPrecision01);
+            /// println!("f32 from (0,1): {}", val);


dhardy · 2018-03-18T11:34:28Z

Maybe also document that the smallest non-zero value which can be generated is 0.00000000023283064 = 2.3e-10 (f32) or 0.00000000000000000005421010862427522
= 5.4e-20 as well as that precision is reduced below 0.00195 (f32) or 0.000244 (f64). This is significant for both HighPrecision01 and gen_bool (though of course the method used by the latter may change).

I see the distribution HighPrecision01 is publicly visible but not documented; I think at this point it would probably be best to document it but with a clear warning that it may be adjusted or removed in the future (or we could add an 'experimental' feature and feature-gate it... but we have too many feature gates already really).

pitdicker · 2018-03-18T13:05:16Z

Thank you for the close read!

as well as that precision is reduced below 0.00195 (f32) or 0.000244 (f64).

This is not really true, as the precision reduces when the number get higher. And it changes with every (negative) power of two. But I have tried to write some documentation with similar intent.

dhardy · 2018-03-18T16:30:22Z

Okay, I'm happy for this to be merged. I'd prefer you wait 2-3 days from first opening however in case anyone else has a concern.

pitdicker · 2018-03-18T16:48:28Z

Added a commit to use gen_bool in the Binomial distribution.

pitdicker · 2018-03-20T08:36:02Z

@dhardy I am getting second thoughts about this implementation of gen_bool. I tried to figure out exactly how much the extra accuracy buys us, and if it is possible to make it faster.

What are out limitations of gen_bools accuracy?

The accuracy of p. p depends on the 'sliding accuracy scale' (just made up the term 😄) of the floating point format. In the worst case it is accurate to 1 in 2^53 for f64, and 1 in 2^24 for f32. 1 in 2^24 means a bias could be introduced every once in 16.7 million. That seems like an amount that might just influence the correctness of some result, so that rules out f32 for p.
The accuracy of the generated float. This accuracy only matters directly around the value of p. If the generated value is clearly lower or higher, the accuracy doesn't matter. HighPrecision01 will only improve the result in the cases where the value is within EPSILON (2^-52) of p.
The accuracy of the generated integer by the RNG. We can go with an u32, u64 or even use multiple integers.

When can accuracy be problem when it is too little?
Suppose we have an accuracy of 1 in 2^52 (the accuracy of gen::<u64>()). Then only once in every 2^52 (~4,5e15) runs of gen_bool the result could introduce a bias. So this will only matter to an algorithm that runs for much more than 2^52 iterations, and that depends for the correctness of the result on the accuracy of gen_bool.

I would say that is something very rare. And if you have an algorithm where the result depends on the accuracy of a function you are going to run for more then 2^52 times, I think it is your responsibility to glance over that function and determine if it fits your use.

So an accuracy of 2^24 is sometimes too little. 2^52 is very more than enough. Could 2^32 be reasonable? That would allow us to use a single u32 from the RNG, potentially making it about 2× faster. It would give enough accuracy that it only starts to matter a bit after quite some more than 2^32 rounds, 4 billion. That seems very reasonable to me for a common method as gen_bool.

Now I think the following implementation can be interesting, because it turns into a comparison against a constant if p is constant. What we do is not convert the integer from the RNG to a float, but convert p to an integer.

fn gen_bool(&mut self, p: f64) -> bool {
	assert!(p >= 0.0 && p <= 1.0);
	let p_int = (p * core::u32::MAX as f64) as u32;
	self.gen() < p_int
}

dhardy · 2018-03-20T09:01:50Z

That sounds reasonable (though it should be self.gen() <= p_int if you consider p=1.0).

1 in 2^32 bias is probably okay. I have been involved in experiments which may have used around 2^32 Bernoulli samples, but I doubt a single sample error would have had much effect on the results.

I was wondering if the equivalent using u64 would even be possible; I'm not sure: u64::MAX is not representable exactly, hence conversion to f64, [multiplication by 1] and conversion back to u64 might result in overflow.

pitdicker · 2018-03-20T09:17:36Z

though it should be self.gen() <= p_int

Thank you, meant to write that... 😄

The equivalent with u64 will not give 64 bits of precision. We would certainly have rounding rounding issues going from float to integer, and I remember them to be unpredictable (i.e. is is possible to change the rounding mode of the CPU). I would not use this method for u64.

pitdicker · 2018-03-20T10:21:26Z

Changed gen_bool as discussed.

When I tried to benchmark (does it really perform as hoped?) I had some trouble with SmallRng, so added a benchmark for that one too.

test misc_gen_weighted_bool          ... bench:       4,176 ns/iter (+/- 406) (deprecated)
test misc_gen_bool                   ... bench:       4,043 ns/iter (+/- 172) (before)

test gen_u32_xorshift                ... bench:       1,101 ns/iter (+/- 115) = 3633 MB/s
test misc_gen_bool                   ... bench:       1,557 ns/iter (+/- 157)
test misc_gen_bool_var               ... bench:       1,594 ns/iter (+/- 62)

dhardy · 2018-03-20T10:29:10Z

src/lib.rs

@@ -551,7 +551,8 @@ pub trait Rng: RngCore {
    /// ```
    fn gen_bool(&mut self, p: f64) -> bool {
        assert!(p >= 0.0 && p <= 1.0);
-        self.sample::<f64, _>(distributions::HighPrecision01) < p
+        let p_int = (p * core::u32::MAX as f64) as u32;
+        p_int > self.gen()


Why swap left-right sides and comparator now?

Otherwise type inference couldn't figure it out... the alternative was self.gen::<u32>() <= p_int.

Oh, really? But you need to use >= then.

dhardy · 2018-03-20T10:34:04Z

src/lib.rs

@@ -864,7 +865,7 @@ impl SeedableRng for StdRng {
 }

 /// An RNG recommended when small state, cheap initialization and good
-/// performance are required. The PRNG algorithm in `SmallRng` is choosen to be
+/// performance are required. The PRNG algorithm in `SmallRng` is chosen to be
 /// efficient on the current platform, **without consideration for cryptography


Note that Xorshift has good next_u32 performance but not as good next_u64 performance as several other generators. Wait, something's wrong:

test gen_u32_xorshift ... bench: 4,635 ns/iter (+/- 198) = 862 MB/s test gen_u64_xorshift ... bench: 2,840 ns/iter (+/- 93) = 2816 MB/s

Impossible that next_u64 is faster than next_u32. Anyway, be careful comparing benchmarks for gen_bool: I would imagine it most useful in heavy numerical simulators which would likely either use native 64-bit generators or buffered generators, i.e. a u32 may not be half the price of a u64.

True, sometimes it will be about half the price, sometimes it just makes no difference.

And you are getting bitten again (I think) by the rust bug with multiple codegen units and benchmarks harness. Can you retry with export RUSTFLAGS="-C codegen-units=1"?

Interesting; didn't affect most tests (including xorshift bytes with around 900MB/s) but:

test gen_u32_xorshift ... bench: 1,045 ns/iter (+/- 59) = 3827 MB/s

Actually, that was with another change calling black_box less frequently. Without that I get approx 3000 MB/s. The u64 results only change by about 50MB/s however.

dhardy

It would also be nice if you cleaned this up and pulled the HighPrecision01 stuff into a separate PR, since it's not directly connected any more.

dhardy · 2018-03-21T11:33:58Z

src/lib.rs

+    /// let mut rng = thread_rng();
+    /// println!("{}", rng.gen_bool(1.0 / 3.0));
+    /// ```
+    fn gen_bool(&mut self, p: f64) -> bool {


Please add a unit test, at the very least testing that gen_bool(1.0) doesn't panic, or better testing that both 1.0 and 0.0 produce the expected outputs a few times over.

pitdicker · 2018-03-21T12:33:36Z

It would also be nice if you cleaned this up and pulled the HighPrecision01 stuff into a separate PR, since it's not directly connected any more.

Can't I sneak in anything quietly? 😄

pitdicker · 2018-03-21T13:20:17Z

Rebased, removed the addition of HighPrecision01, and combined a few commits, now that they only changed a single line etc.

I have added a test for gen_bool, and changed the benchmarks to also use black_box less.

pitdicker · 2018-03-21T13:29:14Z

The benchmark results have changed a bit, but finally realistic:

test misc_gen_bool                   ... bench:       1,184 ns/iter (+/- 3)
test misc_gen_bool_var               ... bench:       4,129 ns/iter (+/- 41)

A floating point multiply is quite expensive compared to a couple of shifts and XORs, as it should be. Still when p is variable performance is similar to the old gen_weighted_bool and the method converting to floats.

dhardy · 2018-03-21T13:59:26Z

Looks good, apart from using p_int >= self.gen() als already suggested

pitdicker · 2018-03-21T14:01:39Z

Ah, the comment was collapsed.

self.gen::<u32>() <= p_int is the same as p_int > self.gen() (not an >=). But I can change the line if you like.

dhardy · 2018-03-21T14:44:39Z

No it's not.

pitdicker · 2018-03-21T14:54:42Z

You are right. What was I confusing it with??? (updated)

pitdicker · 2018-03-21T16:33:35Z

Ready to merge?

Deprecate Rng::gen_weighted_bool

pitdicker force-pushed the deprecate_weighted_bool branch from 3b06ad0 to b0a5e4f Compare March 17, 2018 17:29

pitdicker force-pushed the deprecate_weighted_bool branch from b0a5e4f to 95865cf Compare March 17, 2018 20:46

dhardy reviewed Mar 18, 2018

View reviewed changes

dhardy added X-enhancement B-API Breakage: API F-new-int Functionality: new, within Rand P-high Priority: high D-review Do: needs review labels Mar 18, 2018

pitdicker mentioned this pull request Mar 18, 2018

Add binomial and Poisson distributions #96

Merged

pitdicker force-pushed the deprecate_weighted_bool branch from 8cd8a5f to bd5ad92 Compare March 18, 2018 16:44

This was referenced Mar 19, 2018

Consider pruning gen_weighted_bool #47

Closed

last bit of random f32 and f64 is not random #315

Closed

dhardy reviewed Mar 20, 2018

View reviewed changes

dhardy reviewed Mar 21, 2018

View reviewed changes

Add gen_bool method to Rng

6a9e0d7

pitdicker force-pushed the deprecate_weighted_bool branch from 657f86e to caf811b Compare March 21, 2018 13:17

pitdicker mentioned this pull request Mar 21, 2018

Implement HighPrecision01 distribution #320

Closed

pitdicker added 3 commits March 21, 2018 15:53

Deprecate Rng::gen_weighted_bool

99804f4

Add benchmarks for gen_bool and SmallRng

cd28b0e

Use gen_bool in Binomial distribution

a20c7b1

pitdicker force-pushed the deprecate_weighted_bool branch from caf811b to a20c7b1 Compare March 21, 2018 14:54

dhardy merged commit 4acef1b into rust-random:master Mar 21, 2018

pitdicker deleted the deprecate_weighted_bool branch March 21, 2018 18:34

pitdicker pushed a commit that referenced this pull request Apr 4, 2018

Merge pull request #308 from pitdicker/deprecate_weighted_bool

ef7cc47

Deprecate Rng::gen_weighted_bool

vks mentioned this pull request Apr 11, 2018

Add Bernoulli distribution #300

Closed

dhardy mentioned this pull request Jun 6, 2018

Add API for getting a bool with chance of exactly 1-in-10 or 2-in-3 #491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate Rng::gen_weighted_bool #308

Deprecate Rng::gen_weighted_bool #308

pitdicker commented Mar 17, 2018

dhardy commented Mar 17, 2018

pitdicker commented Mar 17, 2018

dhardy left a comment

dhardy Mar 18, 2018

dhardy Mar 18, 2018

dhardy Mar 18, 2018

dhardy Mar 18, 2018

dhardy commented Mar 18, 2018 •

edited

Loading

pitdicker commented Mar 18, 2018

dhardy commented Mar 18, 2018

pitdicker commented Mar 18, 2018

pitdicker commented Mar 20, 2018 •

edited

Loading

dhardy commented Mar 20, 2018

pitdicker commented Mar 20, 2018

pitdicker commented Mar 20, 2018 •

edited

Loading

dhardy Mar 20, 2018

pitdicker Mar 20, 2018

dhardy Mar 21, 2018

dhardy Mar 20, 2018

pitdicker Mar 20, 2018 •

edited

Loading

dhardy Mar 20, 2018

dhardy Mar 20, 2018

dhardy left a comment

dhardy Mar 21, 2018

pitdicker commented Mar 21, 2018

pitdicker commented Mar 21, 2018

pitdicker commented Mar 21, 2018

dhardy commented Mar 21, 2018

pitdicker commented Mar 21, 2018

dhardy commented Mar 21, 2018

pitdicker commented Mar 21, 2018

pitdicker commented Mar 21, 2018

Deprecate Rng::gen_weighted_bool #308

Deprecate Rng::gen_weighted_bool #308

Conversation

pitdicker commented Mar 17, 2018

dhardy commented Mar 17, 2018

pitdicker commented Mar 17, 2018

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy commented Mar 18, 2018 • edited Loading

pitdicker commented Mar 18, 2018

dhardy commented Mar 18, 2018

pitdicker commented Mar 18, 2018

pitdicker commented Mar 20, 2018 • edited Loading

dhardy commented Mar 20, 2018

pitdicker commented Mar 20, 2018

pitdicker commented Mar 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker Mar 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker commented Mar 21, 2018

pitdicker commented Mar 21, 2018

pitdicker commented Mar 21, 2018

dhardy commented Mar 21, 2018

pitdicker commented Mar 21, 2018

dhardy commented Mar 21, 2018

pitdicker commented Mar 21, 2018

pitdicker commented Mar 21, 2018

dhardy commented Mar 18, 2018 •

edited

Loading

pitdicker commented Mar 20, 2018 •

edited

Loading

pitdicker commented Mar 20, 2018 •

edited

Loading

pitdicker Mar 20, 2018 •

edited

Loading