-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve slice.binary_search_by()'s best-case performance to O(1) #74024
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
c00e434
to
65868eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to put together a benchmark assessing the worst case impact? The new implementation does potentially 50% more conditional branches in the hot loop.
This branch will almost always be predicted correctly. |
The doc says any one could be returned precisely because we don't want it to tie to a particular implementation. Returning the last one is obviously also "any one". Changing the implementation to actually return "any one" instead of the last one isn't breaking because that's the contract specified. |
Have you run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs? |
Ping from triage: |
☔ The latest upstream changes (presumably #73265) made this pull request unmergeable. Please resolve the merge conflicts. |
ef97e6c
to
bdbc28a
Compare
Hi @dtolnay @timvermeulen @Elinvynia @nbdd0121, thanks for your patience and sorry for my late reply. I edited my comment above with the benchmarking result, please give it a review. Thanks. 😃 |
So the performance gets slightly worse (only visible with large amount of data) if the value searched has no dups. BTW: This change needs a crater run as suggested in this test case: |
I checked the assembly generated, the number of instructions in the hot path increases by 2, where I expect only one. It seems a missed optimisation by LLVM. @Folyd can you try benchmarking this code sequence instead of the base = if cmp == Greater { base } else { mid };
if cmp == Equal { break } |
Thanks for the suggestion. I have updated the post. :) |
@Folyd can you try running that with different length of array to test the worst case performance? Or make sure the test have the element appeared in the last position assuming it never hits equal. |
I was planning to try quarternary search at first based on https://github.com/scandum/binary_search, let me create some benchmarks and easy way to work test this patch as well. |
Quaternary search has been discussed (and rejected) in #74547 |
I don't see much difference.
benches/bench.rs#![feature(test)]
extern crate test;
use test::black_box;
use test::Bencher;
use binary_search::*;
enum Cache {
L1,
L2,
L3,
}
fn std_bench_binary_search<F>(b: &mut Bencher, cache: Cache, mapper: F)
where
F: Fn(usize) -> usize,
{
let size = match cache {
Cache::L1 => 1000, // 8kb
Cache::L2 => 10_000, // 80kb
Cache::L3 => 1_000_000, // 8Mb
};
let v = (0..size).map(&mapper).collect::<Vec<_>>();
let mut r = 0usize;
b.iter(move || {
// LCG constants from https://en.wikipedia.org/wiki/Numerical_Recipes.
r = r.wrapping_mul(1664525).wrapping_add(1013904223);
// Lookup the whole range to get 50% hits and 50% misses.
let i = mapper(r % size);
black_box(std_binary_search(&v, &i).is_ok());
})
}
fn std_bench_binary_search_worst_case(b: &mut Bencher, cache: Cache) {
let size = match cache {
Cache::L1 => 1000, // 8kb
Cache::L2 => 10_000, // 80kb
Cache::L3 => 1_000_000, // 8Mb
};
let mut v = vec![0; size];
let i = 1;
v[size - 1] = i;
b.iter(move || {
black_box(std_binary_search(&v, &i).is_ok());
})
}
#[bench]
fn std_binary_search_l1(b: &mut Bencher) {
std_bench_binary_search(b, Cache::L1, |i| i * 2);
}
#[bench]
fn std_binary_search_l2(b: &mut Bencher) {
std_bench_binary_search(b, Cache::L2, |i| i * 2);
}
#[bench]
fn std_binary_search_l3(b: &mut Bencher) {
std_bench_binary_search(b, Cache::L3, |i| i * 2);
}
#[bench]
fn std_binary_search_l1_with_dups(b: &mut Bencher) {
std_bench_binary_search(b, Cache::L1, |i| i / 16 * 16);
}
#[bench]
fn std_binary_search_l2_with_dups(b: &mut Bencher) {
std_bench_binary_search(b, Cache::L2, |i| i / 16 * 16);
}
#[bench]
fn std_binary_search_l3_with_dups(b: &mut Bencher) {
std_bench_binary_search(b, Cache::L3, |i| i / 16 * 16);
}
#[bench]
fn std_binary_search_l1_worst_case(b: &mut Bencher) {
std_bench_binary_search_worst_case(b, Cache::L1);
}
#[bench]
fn std_binary_search_l2_worst_case(b: &mut Bencher) {
std_bench_binary_search_worst_case(b, Cache::L2);
}
#[bench]
fn std_binary_search_l3_worst_case(b: &mut Bencher) {
std_bench_binary_search_worst_case(b, Cache::L3);
}
fn stdnew_bench_binary_search<F>(b: &mut Bencher, cache: Cache, mapper: F)
where
F: Fn(usize) -> usize,
{
let size = match cache {
Cache::L1 => 1000, // 8kb
Cache::L2 => 10_000, // 80kb
Cache::L3 => 1_000_000, // 8Mb
};
let v = (0..size).map(&mapper).collect::<Vec<_>>();
let mut r = 0usize;
b.iter(move || {
// LCG constants from https://en.wikipedia.org/wiki/Numerical_Recipes.
r = r.wrapping_mul(1664525).wrapping_add(1013904223);
// Lookup the whole range to get 50% hits and 50% misses.
let i = mapper(r % size);
black_box(stdnew_binary_search(&v, &i).is_ok());
})
}
fn stdnew_bench_binary_search_worst_case(b: &mut Bencher, cache: Cache) {
let size = match cache {
Cache::L1 => 1000, // 8kb
Cache::L2 => 10_000, // 80kb
Cache::L3 => 1_000_000, // 8Mb
};
let mut v = vec![0; size];
let i = 1;
v[size - 1] = i;
b.iter(move || {
black_box(stdnew_binary_search(&v, &i).is_ok());
})
}
#[bench]
fn stdnew_binary_search_l1(b: &mut Bencher) {
stdnew_bench_binary_search(b, Cache::L1, |i| i * 2);
}
#[bench]
fn stdnew_binary_search_l2(b: &mut Bencher) {
stdnew_bench_binary_search(b, Cache::L2, |i| i * 2);
}
#[bench]
fn stdnew_binary_search_l3(b: &mut Bencher) {
stdnew_bench_binary_search(b, Cache::L3, |i| i * 2);
}
#[bench]
fn stdnew_binary_search_l1_with_dups(b: &mut Bencher) {
stdnew_bench_binary_search(b, Cache::L1, |i| i / 16 * 16);
}
#[bench]
fn stdnew_binary_search_l2_with_dups(b: &mut Bencher) {
stdnew_bench_binary_search(b, Cache::L2, |i| i / 16 * 16);
}
#[bench]
fn stdnew_binary_search_l3_with_dups(b: &mut Bencher) {
stdnew_bench_binary_search(b, Cache::L3, |i| i / 16 * 16);
}
#[bench]
fn stdnew_binary_search_l1_worst_case(b: &mut Bencher) {
stdnew_bench_binary_search_worst_case(b, Cache::L1);
}
#[bench]
fn stdnew_binary_search_l2_worst_case(b: &mut Bencher) {
stdnew_bench_binary_search_worst_case(b, Cache::L2);
}
#[bench]
fn stdnew_binary_search_l3_worst_case(b: &mut Bencher) {
stdnew_bench_binary_search_worst_case(b, Cache::L3);
} src/lib.rsuse std::cmp::Ord;
use std::cmp::Ordering::{self, Equal, Greater, Less};
pub fn std_binary_search<T>(s: &[T], x: &T) -> Result<usize, usize>
where
T: Ord,
{
std_binary_search_by(s, |p| p.cmp(x))
}
pub fn std_binary_search_by<'a, T, F>(s: &'a [T], mut f: F) -> Result<usize, usize>
where
F: FnMut(&'a T) -> Ordering,
{
let mut size = s.len();
if size == 0 {
return Err(0);
}
let mut base = 0usize;
while size > 1 {
let half = size / 2;
let mid = base + half;
// mid is always in [0, size), that means mid is >= 0 and < size.
// mid >= 0: by definition
// mid < size: mid = size / 2 + size / 4 + size / 8 ...
let cmp = f(unsafe { s.get_unchecked(mid) });
base = if cmp == Greater { base } else { mid };
size -= half;
}
// base is always in [0, size) because base <= mid.
let cmp = f(unsafe { s.get_unchecked(base) });
if cmp == Equal {
Ok(base)
} else {
Err(base + (cmp == Less) as usize)
}
}
pub fn stdnew_binary_search<T>(s: &[T], x: &T) -> Result<usize, usize>
where
T: Ord,
{
std_binary_search_by(s, |p| p.cmp(x))
}
pub fn stdnew_binary_search_by<'a, T, F>(s: &'a [T], mut f: F) -> Result<usize, usize>
where
F: FnMut(&'a T) -> Ordering,
{
let mut size = s.len();
if size == 0 {
return Err(0);
}
let mut base = 0usize;
while size > 1 {
let half = size / 2;
let mid = base + half;
// mid is always in [0, size), that means mid is >= 0 and < size.
// mid >= 0: by definition
// mid < size: mid = size / 2 + size / 4 + size / 8 ...
let cmp = f(unsafe { s.get_unchecked(mid) });
if cmp == Equal {
return Ok(mid);
} else if cmp == Less {
base = mid
}
size -= half;
}
// base is always in [0, size) because base <= mid.
let cmp = f(unsafe { s.get_unchecked(base) });
if cmp == Equal {
Ok(base)
} else {
Err(base + (cmp == Less) as usize)
}
} |
@pickfire you called std_binary_search_by in stdnew_binary_search, it should be stdnew_binary_search_by. |
@pickfire it's kind of suspicious that the |
☀️ Test successful - checks-actions |
So it looks like this change broke a rather significant (at least in some circles) user: https://polkadot.network/a-polkadot-postmortem-24-05-2021/ Disclosure: I don't have any association with Polkadot, but I do work in digital assets space other than just being a long time Rust user and enthusiast. I might be wrong about some detail somewhere, but it seems to me that it was a mistake to create an API with non deterministic result in the first place, and it's not in Rust spirit to create pitfalls like this for its users. Hyrum's Law |
I had thought the order of how an Does We have to consider about this if we think their accident was not only their bad and if we want to be more secure language. |
I would agree with @VillSnow here. While the behavior of
And while related functions are highlighted, it is not immediately clear that Rewording the documentation of |
To be blunt, the issue seems to be making assumptions about function behavior without reading the documentation, forgetting about that critical part later or not realizing that it's relevant to one's application or something along that line. Also hashmap and binary search aren't quite comparable. Hashmap is intentionally randomized, which means anything relying on iteration order will be revealed very quickly. The order of binary_search on the other hand is stable within a revision of the standard library, so it takes longer to uncover accidental reliance. That is unfortunate but necessary to provide some flexibility for the implementation to evolve. Adding some jitter to the halving step to introduce runtime randomness might be possible but that would probably impact performance or at least make benchmarks noisier and also break people who rely on ordering being stable within a compiler version (well, they shouldn't do that either...) |
That could be done like only for debug mode. |
std normally isn't recompiled, so enabling debug assertions in a downstream crate won't enable them in std and cargo's build-std is still unstable. |
BTW. One redditor brought a really great point that binary search that does not commit to first/last element is of very limited use. It pretty much only good for existence check. So I guess rust-lang/rfcs#2184 is much more important now. |
I meant 'waning' which is raised by compiler and have to suppressed by
Interesting point. I came up with making a sorted list with duplicated key, but it might be niche. |
Note that the function is not nondeterministic. If you run it multiple times with the same data, it returns the exact same result. It doesn't follow multiple (or randomly selected) paths, etc. But the behaviour can change between different versions of the standard library. That's not nondeterminism.
In most cases you already know the element only occurs once in the slice, in which case you can be sure that that is the exact element it will find. That's the main reason for this: that the algorithm can stop when it finds the element, without having to continue searching to check if there's more identical elements. |
That's case, I was wrong. Find 2 from 0..=9Get the first element
Takes 3 steps Early return
Takes only 2 steps |
…=JohnTitor Integrate binary search codes of binary_search_by and partition_point For now partition_point has own binary search code piece. It is because binary_search_by had called the comparer more times and the author (=me) wanted to avoid it. However, now binary_search_by uses the comparer minimum times. (rust-lang#74024) So it's time to integrate them. The appearance of the codes are a bit different but both use completely same logic.
…ohnTitor Integrate binary search codes of binary_search_by and partition_point For now partition_point has own binary search code piece. It is because binary_search_by had called the comparer more times and the author (=me) wanted to avoid it. However, now binary_search_by uses the comparer minimum times. (rust-lang#74024) So it's time to integrate them. The appearance of the codes are a bit different but both use completely same logic.
…ochenkov Prefer `partition_point` to look up assoc items Since we now have `partition_point` (instead of `equal_range`), I think it's worth trying to use it instead of manually finding it. `partition_point` uses `binary_search_by` internally (rust-lang#85406) and its performance has been improved (rust-lang#74024), so I guess this will make a performance difference.
In #128254 I actually revert to the old algorithm which turns out to be much faster once you ensure that LLVM doesn't turn CMOV into branches. |
This PR aimed to improve the slice.binary_search_by()'s best-case performance to O(1).
Noticed
I don't know why the docs of
binary_search_by
said"If there are multiple matches, then any one of the matches could be returned."
, but the implementation isn't the same thing. Actually, it returns the last one if multiple matches found.Then we got two options:
If returns the last one is the correct or desired result
Then I can rectify the docs and revert my changes.
If the docs are correct or desired result
Then my changes can be merged after fully reviewed.
However, if my PR gets merged, another issue raised: this could be a breaking change since if multiple matches found, the returning order no longer the last one instead of it could be any one.
For example:
Benchmarking
Old implementations
New implementations (1)
Implemented by this PR.
New implementations (2)
Suggested by @nbdd0121 in comment.
I run some benchmarking testings against on two implementations. The new implementation has a lot of improvement in duplicates cases, while in
binary_search_l3
case, it's a little bit slower than the old one.