Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port new Range implementation and only have one uniform float distribution #274

Merged
merged 8 commits into from
Mar 3, 2018

Conversation

pitdicker
Copy link
Contributor

Finally finished the first part of this.

The biggest change is the porting the new Range implementation. Integers now use a much faster implementation based on a widening multiply instead of modulus.

I have added a new private trait IntoFloat with an into_float_with_exponent method as a building block to convert from integers to floats.

The Open01 and Closed01 distributions are removed, and Uniform for floats will now return values in the open range (0, 1).
IntoFloat is also used in an optimised implementation in Range, and in ziggurat.

Will post benchmarks later today.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this leave next_f32/64 hanging around uselessly? You might as well remove those, and I'll remove that part of my PR.

I still need to take a closer look, but it's great to finally have this bit land!

@tspiteri are you interested in reviewing this?

/// 52 for `f64`.
/// The resulting value will fall in a range that depends on the exponent.
/// As an example the range with exponent 0 will be
/// [2<sup>0</sup>..2<sup>1</sup>-1), which is [1..2).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.pow(1) - 1 = 1, not 2 — or am I not reading it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, -1 is not supposed to be there

// TODO: This range is not open, is that a poblem?
(bits >> 12).into_float_with_exponent(1) - 3.0
} else {
// Convert to a value in the range [0,1) and substract to get (0,1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range [1, 2) ?

let u = if symmetric {
// Convert to a value in the range [2,4) and substract to get [-1,1)
// TODO: This range is not open, is that a poblem?
(bits >> 12).into_float_with_exponent(1) - 3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know. But is it not easy to make it open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be done by changing the constant 3.0, because 3.0-EPSILON is not representable. So we would need one extra addition. But I looked a bit more careful at the function(s) and can't imagine it to be a problem.

@pitdicker
Copy link
Contributor Author

Benchmarks, taken with the following commands using cargo benchcmp:

git checkout master
cargo bench --features=i128_support > master
git checkout port_range
cargo bench --features=i128_support > port_range
cargo benchcmp control variable --threshold 1
 name                             master ns/iter        port_range ns/iter    diff ns/iter   diff %  speedup 
 distr_exp                        6,193 (1291 MB/s)     5,925 (1350 MB/s)             -268   -4.33%   x 1.05 
 distr_gamma_large_shape          19,570 (408 MB/s)     17,865 (447 MB/s)           -1,705   -8.71%   x 1.10 
 distr_gamma_small_shape          79,406 (100 MB/s)     77,951 (102 MB/s)           -1,455   -1.83%   x 1.02 
 distr_log_normal                 25,938 (308 MB/s)     24,019 (333 MB/s)           -1,919   -7.40%   x 1.08 
 distr_normal                     6,877 (1163 MB/s)     6,470 (1236 MB/s)             -407   -5.92%   x 1.06 
 distr_range_i128                 143,214 (111 MB/s)    8,529 (1875 MB/s)         -134,685  -94.04%  x 16.79 
 distr_range_i16                  4,441 (450 MB/s)      2,512 (796 MB/s)            -1,929  -43.44%   x 1.77 
 distr_range_i32                  4,968 (805 MB/s)      3,062 (1306 MB/s)           -1,906  -38.37%   x 1.62 
 distr_range_i64                  9,623 (831 MB/s)      2,910 (2749 MB/s)           -6,713  -69.76%   x 3.31 
 distr_range_i8                   5,147 (194 MB/s)      2,510 (398 MB/s)            -2,637  -51.23%   x 2.05 
 gen_range_i128                   144,053 (111 MB/s)    15,796 (1012 MB/s)        -128,257  -89.03%   x 9.12 
 gen_range_i16                    4,202 (475 MB/s)      2,530 (790 MB/s)            -1,672  -39.79%   x 1.66 
 gen_range_i32                    4,877 (820 MB/s)      3,069 (1303 MB/s)           -1,808  -37.07%   x 1.59 
 gen_range_i64                    9,691 (825 MB/s)      6,946 (1151 MB/s)           -2,745  -28.33%   x 1.40 
 gen_range_i8                     5,102 (196 MB/s)      2,530 (395 MB/s)            -2,572  -50.41%   x 2.02 
 misc_sample_indices_100_of_1k    1,784                 715                         -1,069  -59.92%   x 2.50 
 misc_sample_indices_10_of_1k     706                   585                           -121  -17.14%   x 1.21 
 misc_sample_indices_50_of_1k     979                   429                           -550  -56.18%   x 2.28 
 misc_sample_iter_10_of_100       1,601                 954                           -647  -40.41%   x 1.68 
 misc_sample_slice_10_of_100      229                   150                            -79  -34.50%   x 1.53 
 misc_sample_slice_ref_10_of_100  225                   150                            -75  -33.33%   x 1.50 
 misc_shuffle_100                 1,529                 843                           -686  -44.87%   x 1.81 

Most of the distributions are a little faster with the optimised float conversion, and the others thanks to the new range code.

@pitdicker
Copy link
Contributor Author

Doesn't this leave next_f32/64 hanging around uselessly? You might as well remove those, and I'll remove that part of my PR.

👍

@pitdicker pitdicker closed this Feb 28, 2018
@pitdicker pitdicker reopened this Feb 28, 2018
@pitdicker
Copy link
Contributor Author

Travis has some problem with incremental compilation, but closing and reopening the PR does not make it retry. No problem

@pitdicker
Copy link
Contributor Author

Added two tiny commits. One to use Range to generate char's, and one to use a sign check for bool's.
This improves the benchmark like this:

distr_uniform_bool       4,357 (229 MB/s)     4,366 (229 MB/s)                9    0.21%   x 1.00
distr_uniform_codepoint  9,004 (444 MB/s)     2,801 (1428 MB/s)          -6,203  -68.89%   x 3.21 

I vaguely remember that generating bools also became faster, but apparently not...

@dhardy
Copy link
Member

dhardy commented Feb 28, 2018

One thing I think you may have missed: distributions::Uniform (in mod.rs) describes its implementations; this probably needs updating.

I don't think bool got faster, just that it didn't get slower when using the most significant bit instead?

The previous code would reject about 50% of the generated numbers, because chars
are always lower than `0x11_0000`, half of the masked `0x1f_ffff`.
@pitdicker
Copy link
Contributor Author

Rebased after the merge of #273.

I don't think bool got faster, just that it didn't get slower when using the most significant bit instead?

Comparing against zero should be just a bit faster than doing a mask first, but I remembered wrong.

@dhardy dhardy mentioned this pull request Mar 1, 2018
33 tasks
Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow; that's a lot to review! I see a lot of the range code is unchanged from my master branch but that you simplified float sampling; I guess this makes sense with reduced precision.

#[inline(always)]
fn into_float_with_exponent(self, exponent: i32) -> $ty {
// The exponent is encoded using an offset-binary representation,
// with the zero offset being 127
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

127 is only correct for f32 I think? Maybe reduce this doc.


let value = rng.$next_u();
let fraction = value >> (float_size - $fraction_bits);
fraction.into_float_with_exponent(0) - (1.0 - EPSILON / 2.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 1+ε is the smallest representable number above 1, then 1-ε/2 is representable; ok. This is the same adjustment as the Open01 removed here but in a single number. Looks fine and functionally identical.

fn into_float_with_exponent(self, exponent: i32) -> $ty {
// The exponent is encoded using an offset-binary representation,
// with the zero offset being 127
let exponent_bits = (($exponent_bias + exponent) as $uty) << $fraction_bits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent to removed UPPER_MASK when exponent == 0; ok.

@@ -87,12 +88,12 @@ mod impls {
}
}

impl<Sup: SampleRange> Sample<Sup> for Range<Sup> {
impl<Sup: SampleRange + RangeImpl<X = Sup>> Sample<Sup> for Range<Sup> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong and won't actually generate any implementations — I think it should be impl<T: RangeImpl> Sample<T::X> for Range<T>. Also below.

/// [`StandardNormal`] distributions produce floating point numbers with
/// alternative ranges or distributions.)
/// open range `(0, 1)`. (The [`Exp1`], and [`StandardNormal`] distributions
/// produce floating point numbers with alternative ranges or distributions.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence is off-topic now; I think just remove it.


macro_rules! range_int_impl {
($ty:ty, $signed:ident, $unsigned:ident,
$i_large:ident, $u_large:ident) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All types should be ty, not ident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are also used like ::core::$u_large::MAX. ident works for both, but ty only for types.

}

macro_rules! wmul_impl {
($ty:ty, $wide:ident, $shift:expr) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$wide:ty

fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> Self::X {
// Generate a value in the range [1, 2)
let value1_2 = (rng.$next_u() >> $bits_to_discard)
.into_float_with_exponent(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this range is half-open, unlike Uniform. Slightly odd, but okay I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was not perfectly happy about the difference. But I haven't yet thought through all the problematic rounding cases. We should not make any guarantees about whether the ranges are open or closed yet.

use distributions::range::{Range, RangeImpl, RangeFloat, SampleRange};

#[test]
fn test_fn_range() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is pretty strange: why two separate loops? why not cache the ranges? do we also test the single-sample variant, and for various types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I copied the tests but didn't look closely. You are right, the first three test do not make much sense or are duplicates.

#[should_panic]
fn test_fn_range_panic_usize() {
let mut r = ::test::rng(816);
Range::new(5, 2).sample(&mut r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't use usize like name implies. This and the previous fn are redundant. Maybe add one unit-test to test all supported int types?

@pitdicker
Copy link
Contributor Author

Thanks for the careful read!

/// it is itself uniform and the `RangeImpl` implementation is correct).
/// `Range::new` and `Range::new_inclusive` will set up a `Range`, which does
/// some preparations up front to make sampeling values faster.
/// `Range::sample_single` is optimized for sampeling values once or only a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 'e' in 'sampling' (also below), but 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants