Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No usize except uniform #1487

Merged
merged 13 commits into from
Sep 9, 2024
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Aug 12, 2024

  • Added a CHANGELOG.md entry

Summary

This is #1471 without Rng::gen_index. Since we can't have UniformUsize without directly supporting usize in Rng::gen_range this is probably the best way to go.

@dhardy dhardy requested review from vks and newpavlov and removed request for vks August 12, 2024 10:23
src/seq/index.rs Show resolved Hide resolved
src/seq/index.rs Show resolved Hide resolved
@dhardy
Copy link
Member Author

dhardy commented Aug 12, 2024

At this point, we can remove the internal rand::seq::gen_index function without affecting tests. Benchmarking this changeset shows some changes, but mostly noise.

$ cargo bench "seq|choose|shuffle|indices"
   Compiling rand v0.9.0-alpha.1 (/home/dhardy/projects/rand/rand)
   Compiling rand_distr v0.5.0-alpha.1 (/home/dhardy/projects/rand/rand/rand_distr)
   Compiling benches v0.1.0 (/home/dhardy/projects/rand/rand/benches)
    Finished `bench` profile [optimized] target(s) in 5.22s
     Running benches/base_distributions.rs (target/release/deps/base_distributions-4d7c8319c6c903ac)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 130 filtered out; finished in 0.00s

 Running src/distr.rs (target/release/deps/distr-0435b5737d5e6436)
 Running benches/generators.rs (target/release/deps/generators-16c771dd48c7abd1)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 47 filtered out; finished in 0.00s

 Running benches/misc.rs (target/release/deps/misc-5858d7fc1cad7292)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 14 filtered out; finished in 0.00s

 Running benches/seq.rs (target/release/deps/seq-ad06a4eecb35fde5)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 24 filtered out; finished in 0.00s

 Running src/seq_choose.rs (target/release/deps/seq_choose-07855b064554e80a)

choose_size-hinted_from_1_ChaCha20
time: [420.20 ps 421.09 ps 422.49 ps]
change: [-80.420% -80.349% -80.265%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

choose_stable_from_1_ChaCha20
time: [5.3811 ns 5.4025 ns 5.4255 ns]
change: [-2.9571% -2.0250% -1.0752%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_1_ChaCha20
time: [4.1099 ns 4.1119 ns 4.1142 ns]
change: [+0.1323% +0.2572% +0.3843%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

choose_windowed_from_1_ChaCha20
time: [5.2392 ns 5.2492 ns 5.2601 ns]
change: [-3.6257% -2.8153% -1.8643%] (p = 0.00 < 0.05)
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
2 (2.00%) high mild
15 (15.00%) high severe

choose_size-hinted_from_2_ChaCha20
time: [4.2299 ns 4.2552 ns 4.2882 ns]
change: [-5.9766% -5.5698% -5.1654%] (p = 0.00 < 0.05)
Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
4 (4.00%) high mild
16 (16.00%) high severe

choose_stable_from_2_ChaCha20
time: [13.357 ns 13.380 ns 13.409 ns]
change: [+1.5822% +1.8502% +2.0972%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

choose_unhinted_from_2_ChaCha20
time: [11.450 ns 11.511 ns 11.579 ns]
change: [+0.3157% +1.0968% +1.7696%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe

choose_windowed_from_2_ChaCha20
time: [10.896 ns 10.902 ns 10.908 ns]
change: [+15.622% +15.756% +15.875%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_3_ChaCha20
time: [4.1573 ns 4.1711 ns 4.1853 ns]
change: [-6.9772% -6.8399% -6.6869%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe

choose_stable_from_3_ChaCha20
time: [24.523 ns 24.573 ns 24.633 ns]
change: [+5.0714% +5.5322% +6.0273%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

choose_unhinted_from_3_ChaCha20
time: [21.728 ns 21.762 ns 21.798 ns]
change: [+0.9964% +1.1999% +1.3999%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe

choose_windowed_from_3_ChaCha20
time: [12.184 ns 12.219 ns 12.267 ns]
change: [+12.798% +13.298% +13.657%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe

choose_size-hinted_from_10_ChaCha20
time: [4.2229 ns 4.2244 ns 4.2260 ns]
change: [-5.7623% -5.5768% -5.4285%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low mild
9 (9.00%) high mild
3 (3.00%) high severe

choose_stable_from_10_ChaCha20
time: [61.407 ns 61.687 ns 61.949 ns]
change: [+1.7071% +1.9575% +2.2228%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
1 (1.00%) high mild
17 (17.00%) high severe

choose_unhinted_from_10_ChaCha20
time: [55.321 ns 55.364 ns 55.436 ns]
change: [+0.1581% +0.2399% +0.3271%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_windowed_from_10_ChaCha20
time: [21.895 ns 21.927 ns 21.964 ns]
change: [+9.9881% +10.193% +10.405%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe

choose_size-hinted_from_100_ChaCha20
time: [4.2295 ns 4.2304 ns 4.2312 ns]
change: [-5.6818% -5.4132% -5.1829%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe

choose_stable_from_100_ChaCha20
time: [377.31 ns 378.45 ns 379.61 ns]
change: [+2.1705% +2.4440% +2.6774%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild

choose_unhinted_from_100_ChaCha20
time: [334.08 ns 334.41 ns 334.86 ns]
change: [-0.2562% +0.8827% +1.8004%] (p = 0.10 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe

choose_windowed_from_100_ChaCha20
time: [125.49 ns 125.66 ns 125.85 ns]
change: [+23.492% +23.829% +24.238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe

Benchmarking choose_size-hinted_from_1000_ChaCha20: Collecting 100 samples in estimated 5.0000 s (1.2B iterationschoose_size-hinted_from_1000_ChaCha20
time: [4.2411 ns 4.2439 ns 4.2468 ns]
change: [-3.6234% -3.3519% -2.9879%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
4 (4.00%) high mild
2 (2.00%) high severe

choose_stable_from_1000_ChaCha20
time: [2.9220 µs 2.9234 µs 2.9251 µs]
change: [+2.9325% +3.1211% +3.3032%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low severe
6 (6.00%) high mild
6 (6.00%) high severe

choose_unhinted_from_1000_ChaCha20
time: [2.5877 µs 2.5937 µs 2.5994 µs]
change: [+4.2999% +4.6570% +5.0714%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_windowed_from_1000_ChaCha20
time: [915.85 ns 921.40 ns 928.01 ns]
change: [+36.010% +36.576% +37.193%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe

choose_size-hinted_from_1_Pcg32
time: [429.96 ps 430.20 ps 430.48 ps]
change: [+2.2400% +2.6419% +2.9365%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
7 (7.00%) high mild
2 (2.00%) high severe

choose_stable_from_1_Pcg32
time: [5.6490 ns 5.6615 ns 5.6759 ns]
change: [-15.651% -13.773% -11.886%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_1_Pcg32
time: [4.1208 ns 4.1285 ns 4.1349 ns]
change: [+6.6747% +6.9684% +7.2107%] (p = 0.00 < 0.05)
Performance has regressed.

choose_windowed_from_1_Pcg32
time: [5.3411 ns 5.3677 ns 5.3969 ns]
change: [-3.7357% -2.8700% -1.9170%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe

choose_size-hinted_from_2_Pcg32
time: [2.1998 ns 2.2006 ns 2.2017 ns]
change: [+86.310% +86.618% +86.940%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe

choose_stable_from_2_Pcg32
time: [11.848 ns 11.850 ns 11.854 ns]
change: [+4.0385% +4.1321% +4.2191%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe

choose_unhinted_from_2_Pcg32
time: [10.363 ns 10.384 ns 10.404 ns]
change: [+11.625% +12.011% +12.460%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe

choose_windowed_from_2_Pcg32
time: [8.8771 ns 8.8879 ns 8.9022 ns]
change: [+12.376% +12.599% +12.812%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe

choose_size-hinted_from_3_Pcg32
time: [2.1523 ns 2.1543 ns 2.1563 ns]
change: [+83.730% +84.097% +84.436%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_stable_from_3_Pcg32
time: [22.232 ns 22.353 ns 22.492 ns]
change: [+1.3781% +1.5846% +1.8387%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe

choose_unhinted_from_3_Pcg32
time: [20.813 ns 20.818 ns 20.822 ns]
change: [+8.5935% +8.6926% +8.7793%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

choose_windowed_from_3_Pcg32
time: [10.484 ns 10.494 ns 10.505 ns]
change: [+16.239% +16.449% +16.652%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

choose_size-hinted_from_10_Pcg32
time: [2.1910 ns 2.1924 ns 2.1937 ns]
change: [+85.907% +86.242% +86.567%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low severe
3 (3.00%) high mild
1 (1.00%) high severe

choose_stable_from_10_Pcg32
time: [59.869 ns 59.938 ns 60.027 ns]
change: [+3.3939% +3.5207% +3.6508%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_10_Pcg32
time: [56.683 ns 56.705 ns 56.729 ns]
change: [+6.4272% +6.5180% +6.6045%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

choose_windowed_from_10_Pcg32
time: [18.747 ns 18.828 ns 18.918 ns]
change: [+20.566% +21.009% +21.493%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low severe
7 (7.00%) high mild
5 (5.00%) high severe

choose_size-hinted_from_100_Pcg32
time: [2.1903 ns 2.1972 ns 2.2037 ns]
change: [+85.257% +85.891% +86.510%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild

choose_stable_from_100_Pcg32
time: [364.40 ns 364.74 ns 365.37 ns]
change: [+2.3807% +2.5940% +2.7882%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
6 (6.00%) low mild
4 (4.00%) high mild
10 (10.00%) high severe

choose_unhinted_from_100_Pcg32
time: [349.07 ns 350.40 ns 351.89 ns]
change: [+5.5706% +5.8794% +6.2248%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_windowed_from_100_Pcg32
time: [106.57 ns 107.04 ns 107.64 ns]
change: [+36.386% +37.128% +37.985%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe

choose_size-hinted_from_1000_Pcg32
time: [2.2684 ns 2.2957 ns 2.3274 ns]
change: [+92.035% +92.996% +94.047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
7 (7.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe

choose_stable_from_1000_Pcg32
time: [2.8277 µs 2.8294 µs 2.8315 µs]
change: [+2.0175% +2.5036% +2.8025%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe

choose_unhinted_from_1000_Pcg32
time: [2.6369 µs 2.6384 µs 2.6401 µs]
change: [+2.6229% +2.7625% +2.9061%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe

choose_windowed_from_1000_Pcg32
time: [775.98 ns 780.42 ns 785.17 ns]
change: [+57.292% +58.290% +59.351%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

choose_size-hinted_from_1_Pcg64
time: [422.75 ps 422.91 ps 423.06 ps]
change: [-78.014% -77.995% -77.977%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe

choose_stable_from_1_Pcg64
time: [6.7144 ns 6.7639 ns 6.8084 ns]
change: [-19.701% -18.096% -16.326%] (p = 0.00 < 0.05)
Performance has improved.

choose_unhinted_from_1_Pcg64
time: [4.3655 ns 4.3715 ns 4.3796 ns]
change: [-1.4495% -1.1652% -0.7308%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe

choose_windowed_from_1_Pcg64
time: [5.5656 ns 5.5840 ns 5.6019 ns]
change: [-15.194% -14.445% -13.646%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_size-hinted_from_2_Pcg64
time: [3.0539 ns 3.0556 ns 3.0579 ns]
change: [-5.0176% -4.7436% -4.3441%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe

choose_stable_from_2_Pcg64
time: [12.695 ns 12.698 ns 12.701 ns]
change: [+1.5912% +1.7476% +1.9125%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild

choose_unhinted_from_2_Pcg64
time: [10.981 ns 11.025 ns 11.081 ns]
change: [+2.4321% +2.8392% +3.2573%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high mild

choose_windowed_from_2_Pcg64
time: [9.9539 ns 9.9566 ns 9.9592 ns]
change: [+12.049% +12.176% +12.296%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
3 (3.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_3_Pcg64
time: [3.0602 ns 3.0604 ns 3.0606 ns]
change: [-4.4339% -4.3124% -4.1954%] (p = 0.00 < 0.05)
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) low severe
2 (2.00%) low mild
8 (8.00%) high mild
1 (1.00%) high severe

choose_stable_from_3_Pcg64
time: [23.798 ns 23.806 ns 23.816 ns]
change: [+1.5472% +1.6436% +1.7350%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
2 (2.00%) high mild
4 (4.00%) high severe

choose_unhinted_from_3_Pcg64
time: [21.441 ns 21.460 ns 21.483 ns]
change: [+1.6660% +1.8470% +2.1155%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
4 (4.00%) high mild
7 (7.00%) high severe

choose_windowed_from_3_Pcg64
time: [11.323 ns 11.328 ns 11.332 ns]
change: [+11.506% +11.604% +11.695%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_10_Pcg64
time: [3.0400 ns 3.0422 ns 3.0447 ns]
change: [-5.5479% -5.4691% -5.3933%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

choose_stable_from_10_Pcg64
time: [62.691 ns 62.853 ns 63.022 ns]
change: [+3.6084% +4.1060% +4.6822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe

choose_unhinted_from_10_Pcg64
time: [55.935 ns 56.072 ns 56.213 ns]
change: [+3.0192% +3.3180% +3.6652%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_windowed_from_10_Pcg64
time: [20.435 ns 20.512 ns 20.598 ns]
change: [+12.132% +12.356% +12.629%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low mild
2 (2.00%) high mild
9 (9.00%) high severe

choose_size-hinted_from_100_Pcg64
time: [3.1388 ns 3.1474 ns 3.1567 ns]
change: [-2.8788% -2.4525% -2.0175%] (p = 0.00 < 0.05)
Performance has improved.

choose_stable_from_100_Pcg64
time: [369.21 ns 369.45 ns 369.74 ns]
change: [+3.2396% +3.3271% +3.4248%] (p = 0.00 < 0.05)
Performance has regressed.

choose_unhinted_from_100_Pcg64
time: [321.43 ns 321.64 ns 321.86 ns]
change: [+1.3127% +1.4030% +1.4982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

choose_windowed_from_100_Pcg64
time: [108.48 ns 108.55 ns 108.65 ns]
change: [+9.1049% +9.2832% +9.4614%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe

choose_size-hinted_from_1000_Pcg64
time: [3.0872 ns 3.0998 ns 3.1163 ns]
change: [-2.1530% -1.6326% -1.0692%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe

choose_stable_from_1000_Pcg64
time: [2.8468 µs 2.8499 µs 2.8547 µs]
change: [+2.6786% +2.8290% +3.0302%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe

choose_unhinted_from_1000_Pcg64
time: [2.4865 µs 2.4922 µs 2.4978 µs]
change: [+3.1856% +3.3348% +3.4862%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) low mild
5 (5.00%) high mild
8 (8.00%) high severe

choose_windowed_from_1000_Pcg64
time: [746.74 ns 746.90 ns 747.08 ns]
change: [+11.577% +11.689% +11.798%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe

 Running src/shuffle.rs (target/release/deps/shuffle-68b64b946bda8444)

shuffle_1_ChaCha12 time: [212.49 ps 213.08 ps 213.65 ps]
change: [+0.8434% +1.2614% +1.5863%] (p = 0.00 < 0.05)
Change within noise threshold.

shuffle_2_ChaCha12 time: [7.0658 ns 7.1450 ns 7.2180 ns]
change: [+4.7196% +5.5800% +6.4672%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

shuffle_3_ChaCha12 time: [7.8170 ns 7.8459 ns 7.8808 ns]
change: [+4.6635% +4.9737% +5.2525%] (p = 0.00 < 0.05)
Performance has regressed.

shuffle_10_ChaCha12 time: [18.745 ns 18.784 ns 18.829 ns]
change: [+1.7949% +1.9896% +2.1897%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

partial_shuffle_10_ChaCha12
time: [14.428 ns 14.462 ns 14.501 ns]
change: [+3.3524% +3.6645% +3.9495%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe

shuffle_100_ChaCha12 time: [232.08 ns 232.49 ns 232.89 ns]
change: [-0.7536% -0.5515% -0.3411%] (p = 0.00 < 0.05)
Change within noise threshold.

partial_shuffle_100_ChaCha12
time: [116.43 ns 116.75 ns 117.05 ns]
change: [+3.7060% +4.4136% +5.0969%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

shuffle_1000_ChaCha12 time: [2.0994 µs 2.1037 µs 2.1082 µs]
change: [-4.6414% -3.8207% -3.1850%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild

partial_shuffle_1000_ChaCha12
time: [993.52 ns 995.41 ns 997.41 ns]
change: [+4.9496% +5.4320% +5.9154%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) low mild
2 (2.00%) high mild

shuffle_10000_ChaCha12 time: [21.619 µs 21.663 µs 21.721 µs]
change: [+2.6866% +3.6380% +4.9686%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
10 (10.00%) high mild
10 (10.00%) high severe

partial_shuffle_10000_ChaCha12
time: [10.625 µs 10.703 µs 10.786 µs]
change: [+6.8013% +7.2338% +7.6390%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) low mild
5 (5.00%) high mild
7 (7.00%) high severe

shuffle_1_Pcg32 time: [215.31 ps 215.74 ps 216.29 ps]
change: [+2.7588% +3.0068% +3.2805%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

shuffle_2_Pcg32 time: [9.0827 ns 9.5501 ns 9.9967 ns]
change: [+4.3966% +10.521% +17.494%] (p = 0.00 < 0.05)
Performance has regressed.

shuffle_3_Pcg32 time: [7.4608 ns 7.5080 ns 7.5516 ns]
change: [-2.3084% -1.3876% -0.4683%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

shuffle_10_Pcg32 time: [18.158 ns 18.317 ns 18.478 ns]
change: [+5.7873% +6.8524% +8.0101%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

partial_shuffle_10_Pcg32
time: [14.409 ns 14.468 ns 14.520 ns]
change: [-0.3578% +0.1410% +0.6260%] (p = 0.52 > 0.05)
No change in performance detected.

shuffle_100_Pcg32 time: [172.94 ns 173.21 ns 173.52 ns]
change: [+1.2509% +1.4167% +1.5827%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild

partial_shuffle_100_Pcg32
time: [93.122 ns 93.297 ns 93.494 ns]
change: [+1.1609% +1.7122% +2.1805%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

shuffle_1000_Pcg32 time: [1.7067 µs 1.7083 µs 1.7098 µs]
change: [-4.4770% -2.9869% -1.7272%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe

partial_shuffle_1000_Pcg32
time: [857.50 ns 859.27 ns 861.87 ns]
change: [+0.5617% +1.0207% +1.4152%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

shuffle_10000_Pcg32 time: [16.735 µs 16.755 µs 16.777 µs]
change: [-0.6981% -0.1695% +0.4371%] (p = 0.62 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

partial_shuffle_10000_Pcg32
time: [8.2061 µs 8.2146 µs 8.2227 µs]
change: [+0.3206% +0.6257% +0.9335%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild

shuffle_1_Pcg64 time: [213.99 ps 214.05 ps 214.14 ps]
change: [-0.0089% +0.0851% +0.2149%] (p = 0.12 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe

shuffle_2_Pcg64 time: [7.0886 ns 7.1192 ns 7.1507 ns]
change: [-31.118% -27.988% -24.609%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

shuffle_3_Pcg64 time: [8.5383 ns 8.6103 ns 8.6956 ns]
change: [-29.647% -28.206% -26.673%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

shuffle_10_Pcg64 time: [18.648 ns 18.664 ns 18.680 ns]
change: [-8.3002% -7.7604% -7.2105%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild

partial_shuffle_10_Pcg64
time: [14.545 ns 14.565 ns 14.583 ns]
change: [-7.1614% -6.8663% -6.5580%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low severe
4 (4.00%) high mild

shuffle_100_Pcg64 time: [187.73 ns 188.80 ns 189.83 ns]
change: [-4.0263% -3.7459% -3.4286%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
2 (2.00%) low mild
6 (6.00%) high mild
10 (10.00%) high severe

partial_shuffle_100_Pcg64
time: [96.249 ns 96.532 ns 96.885 ns]
change: [-6.4110% -5.9369% -5.4841%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

shuffle_1000_Pcg64 time: [1.8515 µs 1.8551 µs 1.8588 µs]
change: [-4.9568% -4.4750% -3.7605%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high severe

partial_shuffle_1000_Pcg64
time: [942.37 ns 948.80 ns 957.16 ns]
change: [-5.0863% -4.1937% -2.8587%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

shuffle_10000_Pcg64 time: [19.041 µs 19.062 µs 19.083 µs]
change: [-6.9676% -6.8235% -6.6512%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

partial_shuffle_10000_Pcg64
time: [9.4146 µs 9.4263 µs 9.4428 µs]
change: [-6.7491% -6.4804% -6.0440%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe

 Running src/uniform.rs (target/release/deps/uniform-a5c2754802b04fc5)
 Running src/uniform_float.rs (target/release/deps/uniform_float-c70085eeebdc4543)
 Running benches/weighted.rs (target/release/deps/weighted-fbd407d40100d55b)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s

src/seq/index.rs Outdated Show resolved Hide resolved
src/seq/index.rs Outdated Show resolved Hide resolved
src/seq/index.rs Show resolved Hide resolved
@vks
Copy link
Collaborator

vks commented Aug 12, 2024

Some regressions are quite substantial, do we understand why? E.g.:

choose_windowed_from_1000_Pcg32
time:   [775.98 ns 780.42 ns 785.17 ns]
change: [+57.292% +58.290% +59.351%] (p = 0.00 < 0.05)

@dhardy
Copy link
Member Author

dhardy commented Aug 26, 2024

Some regressions are quite substantial, do we understand why?

choose is impacted since it uses gen_range in a loop... but that previously used gen_index which should be equivalent. So no.

More half-baked thoughts:

  • Code like IteratorRandom::choose should be able to select a 32-bit or 64-bit sampler once, then always use that. There are probably other locations that we could take this approach.
  • The idea not to support gen_range for usize was partly to force people to think more closely about such cases — but this is quite disruptive.
  • Maybe we should only support 32-bit sampling for usize? There can't be that many uses requiring 64-bit sampling.

I just pushed a slightly different impl of UniformUsize but this doesn't have a big impact on benches.

@dhardy
Copy link
Member Author

dhardy commented Aug 26, 2024

Adding support for RangeTo* improves performance a little in some cases (presumably due to constant folding).

@dhardy
Copy link
Member Author

dhardy commented Aug 29, 2024

choose_windowed_from_1000_Pcg32

Given that the choose_windowed (and related) benchmarks were deliberately designed to be hard to optimise for (and are likely not common patterns), I don't think we should be too concerned about these.

I'd like to get some more general benchmarking done, but decided to push #1490 first.

@dhardy
Copy link
Member Author

dhardy commented Sep 6, 2024

Rebased. Running benches now...

@dhardy
Copy link
Member Author

dhardy commented Sep 6, 2024

Comparing 7bdd833 to master (9e030aa): see the attachment.
output.zip
output.html.gz

In summary, the choose benchmarks have some significant changes (+105% to -80%, though the latter is on a <500ps test) and the shuffle benchmarks have some moderately significant changes. Wins and losses vaguely balance each other out with one exception (the worst relative result):

seq_slice_choose_1_of_1000
                        time:   [2.3746 ns 2.3785 ns 2.3823 ns]
                        change: [+105.12% +105.42% +105.73%] (p = 0.00 < 0.05)
                        Performance has regressed.

Overall, benchmark results are probably a smidgen worse, with most of the differences likely due to different inlining.

I find this acceptable (given our goal of portable ranged usize variates).

@dhardy dhardy merged commit ef052ec into rust-random:master Sep 9, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants