Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bernoulli distribution #411

Merged
merged 24 commits into from
May 15, 2018
Merged

Conversation

vks
Copy link
Collaborator

@vks vks commented Apr 19, 2018

  • This just uses gen_bool, not a different implementation.
  • It is implemented for all primitive types. Not sure whether that makes sense. Maybe it is better to only generate bool and let the user do the conversion? Having Distribution<T> implemented for more than one T can make type annotations annoying. If we ever add impl Distribution<f32> for Normal, it will probably break a lot of code...

@dhardy
Copy link
Member

dhardy commented Apr 19, 2018

I don't see how this achieves much, especially the implementations for non-boolean types.

(I know I said I wanted a Bernoulli distribution, but gen_bool is that — explicitly implementing Distribution<T> doesn't appear to add much.)

@vks
Copy link
Collaborator Author

vks commented Apr 19, 2018

I think this is more consistent. Similar to how Uniform is redundant to gen_range.

If inlining and optimization work, the assert should be only evaluated once, so it could be slightly more efficient in tight loops.

@vks
Copy link
Collaborator Author

vks commented Apr 19, 2018

It seems the performance difference can be significant:

test misc_bernoulli_const            ... bench:       1,165 ns/iter (+/- 7)
test misc_bernoulli_var              ... bench:       2,258 ns/iter (+/- 31)
test misc_gen_bool_const             ... bench:       1,067 ns/iter (+/- 14)
test misc_gen_bool_var               ... bench:       4,370 ns/iter (+/- 72)

@dhardy
Copy link
Member

dhardy commented Apr 20, 2018

Okay, that is indeed a big difference.

I'm wondering if we even need the assert. Unfortunately we do with this implementation of gen_bool for representation of the result as an integer, but we don't if we instead generate a sample using HighPrecision01 and compare that with the float constant, so it would be good to compare performance with that approach (for gen_bool performance was similar to the current implementation). It's quite possible that the probability may be computed sometimes and may sometimes be the sum of several numbers supposed to add to 1 — but rounding errors could potentially push it higher.

impl Distribution<bool> for Bernoulli {
#[inline]
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> bool {
rng.gen_bool(self.p)
Copy link
Contributor

@pitdicker pitdicker Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most expensive piece here is multiplying p and converting it to an integer, and a little the bounds test. Can you instead move the initialization to new? Than we have a real reason to use a distribution here.

Also I reopened the issue because we wanted something with more precision than gen_bool. If you multiply by u64::max and use gen::<u64> you get exactly 64 bits of precision, which seems pretty close to using HighPrecision01 which @dhardy and I originally wanted to use. And this should be faster, at least when the set-up happens in new.

Also, does it make sense to make this version exact?
With the multiply method and using u64s, there are exactly 2^64 + 1 steps between 0.0 and 1.0. If we want to sample without any bias (not that that is worth much), always return true for the 1.0 case (possibly without using the RNG), multiply by u64::MAX + 1, and compare the results with <. I am curious what the performance will be. Interested in testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, the are EPSILON / 4 * 2^64 + 1 values that round up to 1.0 and can't be represented by f64. Given that we are already in the rounding error territory, I see no reason to special-case 1.0. Multiplying with u64::MAX and comparing with < should just be good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, no, f64 has 53 bits of precision IIRC and u64::MAX isn't exactly representable in f64; I seem to remember you saying we should use this method with u64 @pitdicker.

I suppose we could multiply by 2.0.powi(64) but there may still be a problem converting to u64 afterwards — does it clamp to its max value if out of bounds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u64::MAX isn't exactly representable in f64

Oops, yes. I thought too much about how all representable values for p in the range 0.0..1.0 are also representable when multiplied in 0..2^64, that I forgot that multiplying by 2^64 - 1 rounds to multiplying by 2^64. So this method does not work without special-casing 1.0, per my first comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could multiply by 2.0.powi(64) but there may still be a problem converting to u64 afterwards — does it clamp to its max value if out of bounds?

No, it's much worse. If we want more than 32 bits of precision, we should probably use the traditional rng.gen::<f64>() <= p method.

@vks
Copy link
Collaborator Author

vks commented Apr 23, 2018

Going to 64 bit using the same method requires special casing large probabilties, because of a fun bug in the compiler:

#![feature(test)]                                                                           
                                                                                            
extern crate test;                                                                          
extern crate core;                                                                          
                                                                                            
use test::black_box;                                                                        
                                                                                            
fn main() {                                                                                 
    let mut i = ::core::u64::MAX;                                                           
    let mut p;                                                                              
    loop {                                                                                  
        p = (black_box(1.0) * (i as f64)) as u64;                                           
        if p != 0 {                                                                         
            break;                                                                          
        }                                                                                   
        i -= 1;                                                                             
    }                                                                                       
    println!("maximal u64 i:\ni={:x}", ::core::u64::MAX);                                   
    println!("maximal u64 i where 1.0*(i as f64) as u64 == p != 0:\ni={:x}\np={:x}", i, p); 
}                                                                                           

prints

maximal u64 i:
i=ffffffffffffffff
maximal u64 i where 1.0*(i as f64) as u64 == p != 0:
i=fffffffffffffbff
p=fffffffffffff800

but without black_box:

maximal u64 i:
i=ffffffffffffffff
maximal u64 i where 1.0*(i as f64) as u64 == p != 0:
i=ffffffffffffffff
p=7f1df84eb700

@pitdicker
Copy link
Contributor

I think you are just hitting the problem that 2^64 - 1 is not representable with f64. Can you try multiplying with 2.0.powi(64) instead?
With the 1.0 probability special-cased to always produce true?

@vks
Copy link
Collaborator Author

vks commented Apr 23, 2018

How would that help? Note that ::core::u64::MAX as f64 works, but the conversion back to u64 is Undefined Behavior. 1.0 is probably not the only problematic value, values close to it could also be affected.

@pitdicker
Copy link
Contributor

I am pretty sure it does not work that way. ::core::u64::MAX as f64 does not work, as that value is not representable in an f64 (discussed above). And the only problematic values that may be ub (not read the whole thread, but I just assume all values that are not representable in an u64) are >= 1.0. If you special-case 1.0 there should be no trouble. 1 ULP below 1.0 maps to a value a couple of thousand less than u64::MAX, so values close to it should not be be affected.

@vks
Copy link
Collaborator Author

vks commented Apr 23, 2018

::core::u64::MAX as f64 does not work, as that value is not representable in an f64 (discussed above).

It gives exactly the same result as 2.0.powi(64) - 1.0 though.

@pitdicker
Copy link
Contributor

It gives exactly the same result as 2.0.powi(64) - 1.0 though.

This discussion is going a bit difficult as I should just write some code to test things... But 2.0.powi(64) - 1.0 is also not representable. I imagine they both round to either 2^64, or one ULP below that.

@vks
Copy link
Collaborator Author

vks commented Apr 23, 2018

Ok, if understood it correctly, I implemented what was suggested and added a test to make sure that it's enough to have a special case for 1.0.

assert!(p >= 0.0, "Bernoulli::new called with p < 0");
assert!(p <= 1.0, "Bernoulli::new called with p > 1");
let p_int = if p != 1.0 {
(p * (::core::u64::MAX as f64)) as u64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what happens here is that u64::MAX gets cast to 2.0.powi(64), in which case the multiplication is probably only an adjustment to the exponent of p. So if we stick with this method, IMO it makes more sense to test r < self.p_int and document that Bernoulli::new(1.0) may sample incorrectly (instead of 0.0), because then it's only 1.0 and small values not a multiple of 2^-64 which aren't sampled exactly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this correctly. What exactly is the advantage of using < in place of <=? That Bernoulli::new(1.0) is not quite correct as opposed to Bernoulli::new(0.0)? Why is that preferable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider 0.5: this should get mapped exactly to 2^63; exactly half of all possible u64 values are less than this.

This is only the case because u64::MAX as f64 isn't representable and essentially becomes 2^64.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to have an error smaller than 2-64. If someone wants more precision, they should probably just use Uniform with only integers.

@pitdicker
Copy link
Contributor

pitdicker commented Apr 24, 2018

@vks What do you think of this? The extra branch seems mostly free thanks to branch prediction, and I think this is good and simple enough to put concerns over bias to rest.

@vks
Copy link
Collaborator Author

vks commented Apr 24, 2018

@vks What do you think of this? The extra branch seems mostly free thanks to branch prediction, and I think this is good and simple enough to put concerns over bias to rest.

I decided against it, because the extra branch for every case did not seem worth it for getting rid of the tiny bias for a special case. I did not check how big the performance impact would be though.

@pitdicker
Copy link
Contributor

Benchmarks with the branch:

test misc_bernoulli_const            ... bench:       2,153 ns/iter (+/- 728)
test misc_bernoulli_var              ... bench:       3,034 ns/iter (+/- 164)

Benchmarks without:

test misc_bernoulli_const            ... bench:       2,091 ns/iter (+/- 310)
test misc_bernoulli_var              ... bench:       2,964 ns/iter (+/- 307)

I mainly suggested all this because @dhardy had concerns about the accuracy of gen_bool, and @huonw about bias. As a distribution can move part of the cost to an initialization method, a distribution can be the more accurate variant over gen_bool at little cost.

@vks
Copy link
Collaborator Author

vks commented Apr 24, 2018

That difference seems to be negligible, I could definitely live with that. (It might be more painful on platforms without branch prediction though, but I'm not sure we should worry about those.)

@pitdicker
Copy link
Contributor

That difference seems to be negligible, I could definitely live with that.

I can already live with the 32-bit accuracy and tiny bias, even more so with this with 64 bit. But as it costs us little, having no easy to point out bias seems like a win to me (no need to defend the choice later...)

@dhardy Interested in making a decisive vote?

@dhardy
Copy link
Member

dhardy commented Apr 24, 2018

The suggestion I just made eliminates bias, except unfortunately with regards to 1.0.

@pitdicker your suggestion makes the distribution's memory larger. Why not map 1.0 to MAX as @vks has but match that value when sampling? 1 - ε/2 will map to significantly less than u64::MAX because u64 has ~11 bits more precision.

@pitdicker
Copy link
Contributor

Why not map 1.0 to MAX as @vks has but match that value when sampling?

That is a good idea. Just use a value known to be impossible instead of Option to reduce overhead.

@vks
Copy link
Collaborator Author

vks commented Apr 24, 2018

I added the special case for p = 1 sampling.

@pitdicker
Copy link
Contributor

Then there was one point of discussion left from the issue: should gen_bool produce the same results as the Bernoulli distribution? Having something available with more precision, even though usually unnecessary, was my reason for re-opening the issue. While you opened an issue hoping to have the same implementation for both.

The idea is that gen_bool with 32 bits of accuracy is good enough for almost all uses, and can be about twice as fast as one using 64 bit.

Because benchmarks with Xorshift can be a bit messy (I think it has some problems with using too many registers or something), I benchmarked here with StdRng. This PR:

test misc_gen_bool_const             ... bench:       4,258 ns/iter (+/- 10)
test misc_gen_bool_var               ... bench:       5,883 ns/iter (+/- 10)

test misc_bernoulli_const            ... bench:       4,238 ns/iter (+/- 7)
test misc_bernoulli_var              ... bench:       4,301 ns/iter (+/- 8)

Before:

test misc_gen_bool_const             ... bench:       2,533 ns/iter (+/- 6)
test misc_gen_bool_var               ... bench:       3,234 ns/iter (+/- 14)

///
/// This `Bernoulli` distribution uses 64 bits from the RNG (a `u64`), making
/// its bias less than 1 in 2<sup>64</sup>. In practice the floating point
/// accuracy of `p` will usually be the limiting factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true: (a) I don't think this method has any bias, other than when rounding due to limited precision, and (b) this method cannot represent values smaller than 2^-64 or not a multiple of that; f64 can.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, I'll update it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I corrected the remarks and made it possible to alternatively construct Bernoulli from integers for increased precision. Should I implement From<f64> and From<u64>?

/// Construct a new `Binomial` with the given shape parameters `n` (number
/// of trials) and `p` (probability of success).
///
/// Panics if `p <= 0` or `p >= 1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The p==0 and p==1 cases are trivial — should we support them? This might require extra branching however.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but this is orthogonal to this PR.

pub fn new(p: f64) -> Bernoulli {
assert!(p >= 0.0, "Bernoulli::new called with p < 0");
assert!(p <= 1.0, "Bernoulli::new called with p > 1");
let p_int = if p < 1.0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could allow p > 1 very easily — should we? My fear is that accumulated floating point errors could push a number slightly above 1 or below 0 when it shouldn't be. Unfortunately we cannot support <0 here without a jump.

Alternatively we could use strict bounds here but suggest use of p > rng.sample(HighPrecision01) or similar where strict bounds may be a problem.

Copy link
Collaborator Author

@vks vks Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could. I thought about this. It would reduce the number of branches, but might mask errors. Accumulated FP errors will happen, but usually the user has to fix those anyway. Adding a branch for p < 0 is not a problem if we allow p < 0, because then we can get rid of the asserts. Another advantage would be that we can make it panic free. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, except for one thing: if p is NaN, the method should panic, since NaNs usually indicate serious problems which should be fixed. The same could be said for values not close to the range [0, 1]. Because of this maybe you are right, that users should fix accumulated FP errors themselves.

BTW wouldn't it reduce the number of branches if we used a single assert!(p >= 0.0 && p <= 1.0, ...) instead?

@vks
Copy link
Collaborator Author

vks commented Apr 25, 2018

@pitdicker

Then there was one point of discussion left from the issue: should gen_bool produce the same results as the Bernoulli distribution?

I think it should. It is a convenience method (that I would actually prefer to remove), having a different precision would be surprising. If we want a lower precision method, I think it should be implemented as an additional distribution. Then we could decide which one gen_bool should use.

///
/// This is more precise than using `Bernoulli::new`.
#[inline]
pub fn from_int(p_int: u64) -> Bernoulli {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes much sense to expose — if users have a probability expressed as a u64 it's quite easy to implement an appropriate test anyway, and users can deal with any bias as appropriate themselves. Our mapping of 1.0 to u64::MAX is essentially a hack, taking advantage of the fact that this number could not be generated otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll remove it and suggest to use Uniform instead.

@@ -48,6 +48,10 @@ impl Bernoulli {
///
/// For `p = 1.0`, the resulting distribution will always generate true.
/// For `p = 0.0`, the resulting distribution will always generate false.
/// Due to the limitations of floating point numbers, not all internally
/// supported probabilities (multiples of 2<sup>-64</sup>) can be specified
/// using this constructor. If you need more precision, use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to say:

This method is accurate for any input p in the range [0, 1] which is a mutliple of 2^-64. Values outside this range are treated as if they were 0 or 1 (whichever is nearest).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pub fn new(p: f64) -> Bernoulli {
assert!(p >= 0.0, "Bernoulli::new called with p < 0");
assert!(p <= 1.0, "Bernoulli::new called with p > 1");
let p_int = if p < 1.0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, except for one thing: if p is NaN, the method should panic, since NaNs usually indicate serious problems which should be fixed. The same could be said for values not close to the range [0, 1]. Because of this maybe you are right, that users should fix accumulated FP errors themselves.

BTW wouldn't it reduce the number of branches if we used a single assert!(p >= 0.0 && p <= 1.0, ...) instead?

///
/// This method is accurate for any input `p` in the range `[0, 1]` which is
/// a multiple of 2<sup>-64</sup>. If you need more precision, use `Uniform`
/// and a comparison instead. (Note that not all multiples of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, why use Uniform, and what did you have in mind (given that this can generate ints and floats)? I don't think we'll include high-precision sampling code in Uniform anyway. Ultimately I'd say here to use HighPrecisionFP or whatever we get, but it's not available yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having something like rng.sample(Uniform::from(0..N)) < M in mind, replacing the usecase of Bernoulli::from_int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for fractional probabilities. Okay, but not everything's a fraction (try π/4). Personally I think just drop this bit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, some fractional probabilities cannot be represented by f64, but we can still sample them via u64. I was trying to adress your comment:

if users have a probability expressed as a u64 it's quite easy to implement an appropriate test anyway, and users can deal with any bias as appropriate themselves

But I'm fine with just dropping it.

/// 2<sup>-64</sup> in `[0, 1]` can be represented as a `f64`.)
#[inline]
pub fn new(p: f64) -> Bernoulli {
assert!(p >= 0.0 & p <= 1.0, "Bernoulli::new not called with 0 <= p <= 0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really should get into the habit of running a few tests before pushing your commits!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that.

@dhardy
Copy link
Member

dhardy commented Apr 26, 2018

I ran the benchmarks. Is the slowdown to misc_gen_bool purely due to increased precision or would a few #[inline] / #[inline(always)] annotations help?

$ export RUSTFLAGS="-C codegen-units=1"
$ cargo +nightly bench --bench misc

# before:
test gen_1k_sample_iter              ... bench:         384 ns/iter (+/- 15) = 2666 MB/s
test misc_gen_bool                   ... bench:       1,397 ns/iter (+/- 45)
test misc_gen_bool_var               ... bench:       4,508 ns/iter (+/- 128)

# now:
test gen_1k_sample_iter              ... bench:         592 ns/iter (+/- 69) = 1729 MB/s
test misc_bernoulli_const            ... bench:       3,168 ns/iter (+/- 156)
test misc_bernoulli_var              ... bench:       3,019 ns/iter (+/- 121)
test misc_gen_bool_const             ... bench:       3,050 ns/iter (+/- 73)
test misc_gen_bool_var               ... bench:       4,233 ns/iter (+/- 202)

I don't get why gen_1k_sample_iter changes; it shouldn't be affected.

@pitdicker would you like to review before this gets merged? I think it's nearly ready.


#[test]
fn test_trivial() {
let mut r = SmallRng::from_entropy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using let mut rng = ::test::rng(123); (but with a unique number) in most tests in order to make them reproducible (and not dependent on EntropyRng). Not a big issue but might as well follow convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the reason to do so was not to make things reproducible, but to make the tests run without the std feature. Now the CI is unhappy 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. We should document this somewhere.

fn test_average() {
const P: f64 = 0.3;
let d = Bernoulli::new(P);
const N: u32 = 10_000_000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 mil' makes this a litte slow. I guess leave it as-is for now though; maybe as part of #357 we can divide tests into two sets (fast and slow) or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately you need a lot of samples to get decent statistics. (The relative error probably scales with 1/sqrt(n_samples).)

vks added 8 commits May 11, 2018 12:44
Also support construction from integers.
* Remove `Bernoulli::from_int`.
* Improve precision comments.
* Only use one branch for asserts.
* Don't use libc.
* Add assert message.
* Add example for undefined behavior.
@vks
Copy link
Collaborator Author

vks commented May 11, 2018

rebase on master;

Done.

remove the extra inline attributes, unless there are real examples of why they are needed;

I think they are need, because the methods are likely to be used in hot loops, so the compiler should have the option to inline them across crates.

make sure the CI is green (i.e. make the tests pass without std)
use &&instead of & in the assert (cause of a previous error)

In this case I want to avoid a conditional jump, so I feel & is more appropriate.

fix the benchmarks as suggested

What do you mean? Use StdRng? Add the accumulation variable? I don't see how the latter is necessary when black_box is used.

maybe squash those 19 commits?

I don't want to do that before the end of review.

maybe use the right constant for multiplying instead of relying on u64::MAX to round correctly? (makes the intend more clear)

I'm not sure what you mean.

@pitdicker
Copy link
Contributor

I think they are need, because the methods are likely to be used in hot loops, so the compiler should have the option to inline them across crates.

The compiler always has the option, but do you want to force it to? I'd rather not see it mixed in with this PR, but I'll leave that up to @dhardy.

In this case I want to avoid a conditional jump, so I feel & is more appropriate.

I am not sure why go the non-standard route here, and if it matters at all.

What do you mean? Use StdRng? Add the accumulation variable? I don't see how the latter is necessary when black_box is used.

Yes, StdRng because Xorshift is not working well in combination with this benchmark. And the goal is to include the set-up time, with the multiplication, in the benchmark. If it doesn't change in the inner loop of a 1000 iterations, do you know for sure LLVM will not move it out of the loop? And it seems like a good idea to have the benchmarks for gen_bool and Bernoulli measure the same thing.

I don't want to do that before the end of review.

👍

maybe use the right constant for multiplying instead of relying on u64::MAX to round correctly? (makes the intend more clear)

I'm not sure what you mean.

You want to multiply by 2^64, not 2^64 - 1. It is harder to write, but kind of an important detail when trying to understand the code in my opinion. Something like 2.0 * (1u64 << 63) as f64?

@vks
Copy link
Collaborator Author

vks commented May 11, 2018

The compiler always has the option, but do you want to force it to? I'd rather not see it mixed in with this PR, but I'll leave that up to @dhardy.

No, the compiler does not have that option across crates (without LTO). Note that #[inline] does not force inlining, only #[inline(always)] does. However, it does affect compile time, so that might be an issue. (Relevant docs.)

I am not sure why go the non-standard route here, and if it matters at all.

It probably generates the same code, but I think & documents the intent better, because I actually don't want short-circuit evaluation.

If it doesn't change in the inner loop of a 1000 iterations, do you know for sure LLVM will not move it out of the loop?

I thought black_box would prevent that.

You want to multiply by 2^64, not 2^64 - 1.

Why? I would want u64::MAX to represent p = 1.0.

@dhardy
Copy link
Member

dhardy commented May 11, 2018

Some interesting notes on #[inline]: https://internals.rust-lang.org/t/when-should-i-use-inline/598/8
Most of the functions #[inline] is applied to here are generic, thus can be inlined across crates anyway (which implies there are probably a few places in this crate we've used it unnecessarily). The remaining cases are Bernoulli::new where it probably makes sense (because of optimisations for constants) and gen_bool (which just wraps new and sample). But I think the attribute can be removed from all the generic functions.

Using & is fine IMO.

I don't know about the benches.

@pitdicker is correct about the constant: you compare the sample r < self.p_int later, so for p=1.0 then p_int should be greater than u64::MAX. But I think the easiest way to write the constant would be 2.0.powi(64) (presumably the compiler can calculate that).

@pitdicker
Copy link
Contributor

But I think the easiest way to write the constant would be 2.0.powi(64) (presumably the compiler can calculate that).

I tried that (when this issue was just a few days old), but powi is not available with no_std.

@vks
Copy link
Collaborator Author

vks commented May 11, 2018

Yes, and it is also not a const fn. The alternative would be @pitdicker's suggestion, but I'm not sure it makes it more clear. Maybe just add a comment explaining that it rounds to what we want?

@dhardy
Copy link
Member

dhardy commented May 14, 2018

@vks I think you still have some tweaks to make? I.e. remove some inline annotations (at least those on generic functions) and add a comment. Then I think we can merge.

@vks
Copy link
Collaborator Author

vks commented May 14, 2018

Yes, I need to do that and possibly improve the benchmarks. I'm a bit busy, but I'll try to find time tonight.

vks added 2 commits May 15, 2018 08:16
They are redundant, because generic functions can be inlined anyway.
@vks
Copy link
Collaborator Author

vks commented May 15, 2018

I did the remaining changes except for the benchmarks.

@pitdicker

And the goal is to include the set-up time, with the multiplication, in the benchmark. If it doesn't change in the inner loop of a 1000 iterations, do you know for sure LLVM will not move it out of the loop?

I don't think it can for misc_*_var, it's wrapped in a black_box. For the other benchmark (misc_*_const) it was done intentionally (as suggested by the comment), because I wanted to make sure that this optimization works, which I think requires inlining. I did not use StdRng, because I wanted to benchmark the bool generation code and not the RNG, so I tried to minimize the RNG overhead. I don't really see the benefit and don't understand what you mean by saying it optimizes differently.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vks! Looks good to me. @pitdicker are you happy?

@pitdicker
Copy link
Contributor

The benchmarks still need fixing. The results are unexpectedly fast for a reason, and I have already shown why. But I'm okay with merging now, and changing those later.

@pitdicker pitdicker merged commit 63bde31 into rust-random:master May 15, 2018
@vks
Copy link
Collaborator Author

vks commented May 15, 2018

I opened #448 for improving the benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-value Breakage: changes output values P-high Priority: high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants