Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve slice.binary_search_by()'s best-case performance to O(1) #74024

Merged
merged 4 commits into from
Mar 5, 2021

Conversation

Folyd
Copy link
Contributor

@Folyd Folyd commented Jul 4, 2020

This PR aimed to improve the slice.binary_search_by()'s best-case performance to O(1).

Noticed

I don't know why the docs of binary_search_by said "If there are multiple matches, then any one of the matches could be returned.", but the implementation isn't the same thing. Actually, it returns the last one if multiple matches found.

Then we got two options:

If returns the last one is the correct or desired result

Then I can rectify the docs and revert my changes.

If the docs are correct or desired result

Then my changes can be merged after fully reviewed.

However, if my PR gets merged, another issue raised: this could be a breaking change since if multiple matches found, the returning order no longer the last one instead of it could be any one.

For example:

let mut s = vec![0, 1, 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55];
let num = 1;
let idx = s.binary_search(&num);
s.insert(idx, 2);

// Old implementations
assert_eq!(s, [0, 1, 1, 1, 1, 2, 2, 3, 5, 8, 13, 21, 34, 42, 55]); 

// New implementations
assert_eq!(s, [0, 1, 1, 1, 2, 1, 2, 3, 5, 8, 13, 21, 34, 42, 55]); 

Benchmarking

Old implementations

$ ./x.py bench --stage 1 library/libcore  
test slice::binary_search_l1           ... bench:          59 ns/iter (+/- 4)
test slice::binary_search_l1_with_dups ... bench:          59 ns/iter (+/- 3)
test slice::binary_search_l2           ... bench:          76 ns/iter (+/- 5)
test slice::binary_search_l2_with_dups ... bench:          77 ns/iter (+/- 17)
test slice::binary_search_l3           ... bench:         183 ns/iter (+/- 23)
test slice::binary_search_l3_with_dups ... bench:         185 ns/iter (+/- 19)

New implementations (1)

Implemented by this PR.

if cmp == Equal {
    return Ok(mid);
} else if cmp == Less {
    base = mid
}
$ ./x.py bench --stage 1 library/libcore  
test slice::binary_search_l1           ... bench:          58 ns/iter (+/- 2)
test slice::binary_search_l1_with_dups ... bench:          37 ns/iter (+/- 4)
test slice::binary_search_l2           ... bench:          76 ns/iter (+/- 3)
test slice::binary_search_l2_with_dups ... bench:          57 ns/iter (+/- 6)
test slice::binary_search_l3           ... bench:         200 ns/iter (+/- 30)
test slice::binary_search_l3_with_dups ... bench:         157 ns/iter (+/- 6)

$ ./x.py bench --stage 1 library/libcore  
test slice::binary_search_l1           ... bench:          59 ns/iter (+/- 8)
test slice::binary_search_l1_with_dups ... bench:          37 ns/iter (+/- 2)
test slice::binary_search_l2           ... bench:          77 ns/iter (+/- 2)
test slice::binary_search_l2_with_dups ... bench:          57 ns/iter (+/- 2)
test slice::binary_search_l3           ... bench:         198 ns/iter (+/- 21)
test slice::binary_search_l3_with_dups ... bench:         158 ns/iter (+/- 11)

New implementations (2)

Suggested by @nbdd0121 in comment.

base = if cmp == Greater { base } else { mid };
if cmp == Equal { break }
$ ./x.py bench --stage 1 library/libcore  
test slice::binary_search_l1           ... bench:          59 ns/iter (+/- 7)
test slice::binary_search_l1_with_dups ... bench:          37 ns/iter (+/- 5)
test slice::binary_search_l2           ... bench:          75 ns/iter (+/- 3)
test slice::binary_search_l2_with_dups ... bench:          56 ns/iter (+/- 3)
test slice::binary_search_l3           ... bench:         195 ns/iter (+/- 15)
test slice::binary_search_l3_with_dups ... bench:         151 ns/iter (+/- 7)

$ ./x.py bench --stage 1 library/libcore  
test slice::binary_search_l1           ... bench:          57 ns/iter (+/- 2)
test slice::binary_search_l1_with_dups ... bench:          38 ns/iter (+/- 2)
test slice::binary_search_l2           ... bench:          77 ns/iter (+/- 11)
test slice::binary_search_l2_with_dups ... bench:          57 ns/iter (+/- 4)
test slice::binary_search_l3           ... bench:         194 ns/iter (+/- 15)
test slice::binary_search_l3_with_dups ... bench:         151 ns/iter (+/- 18)

I run some benchmarking testings against on two implementations. The new implementation has a lot of improvement in duplicates cases, while in binary_search_l3 case, it's a little bit slower than the old one.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 4, 2020
@Folyd Folyd force-pushed the master branch 2 times, most recently from c00e434 to 65868eb Compare July 4, 2020 10:00
@dtolnay dtolnay added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 6, 2020
src/libcore/slice/mod.rs Outdated Show resolved Hide resolved
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to put together a benchmark assessing the worst case impact? The new implementation does potentially 50% more conditional branches in the hot loop.

@dtolnay dtolnay added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 10, 2020
@nbdd0121
Copy link
Contributor

Would you be able to put together a benchmark assessing the worst case impact? The new implementation does potentially 50% more conditional branches in the hot loop.

This branch will almost always be predicted correctly.

@nbdd0121
Copy link
Contributor

I don't know why the docs of binary_search_by said "If there are multiple matches, then any one of the matches could be returned.", but the implementation isn't the same thing. Actually, it returns the last one if multiple matches found.

The doc says any one could be returned precisely because we don't want it to tie to a particular implementation. Returning the last one is obviously also "any one". Changing the implementation to actually return "any one" instead of the last one isn't breaking because that's the contract specified.

@timvermeulen
Copy link
Contributor

@Elinvynia Elinvynia added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 25, 2020
@Elinvynia
Copy link
Contributor

Ping from triage:
Hello @Folyd , could you have a look at the review comment by dtolnay when you have the time? Thanks, and let us know if there are any issues!~

@bors
Copy link
Contributor

bors commented Jul 28, 2020

☔ The latest upstream changes (presumably #73265) made this pull request unmergeable. Please resolve the merge conflicts.

@Folyd Folyd force-pushed the master branch 2 times, most recently from ef97e6c to bdbc28a Compare July 29, 2020 04:21
@Folyd
Copy link
Contributor Author

Folyd commented Jul 29, 2020

Hi @dtolnay @timvermeulen @Elinvynia @nbdd0121, thanks for your patience and sorry for my late reply. I edited my comment above with the benchmarking result, please give it a review. Thanks. 😃

@nbdd0121
Copy link
Contributor

So the performance gets slightly worse (only visible with large amount of data) if the value searched has no dups.

BTW: This change needs a crater run as suggested in this test case:
https://github.com/rust-lang/rust/blob/bdbc28a121482d3aac3111482a8167e190f17b46/library/core/tests/slice.rs#L68-L71

@nbdd0121
Copy link
Contributor

I checked the assembly generated, the number of instructions in the hot path increases by 2, where I expect only one. It seems a missed optimisation by LLVM.

@Folyd can you try benchmarking this code sequence instead of the if/else if?

base = if cmp == Greater { base } else { mid };
if cmp == Equal { break }

@Folyd
Copy link
Contributor Author

Folyd commented Jul 29, 2020

I checked the assembly generated, the number of instructions in the hot path increases by 2, where I expect only one. It seems a missed optimisation by LLVM.

@Folyd can you try benchmarking this code sequence instead of the if/else if?

base = if cmp == Greater { base } else { mid };
if cmp == Equal { break }

Thanks for the suggestion. I have updated the post. :)

@pickfire
Copy link
Contributor

@Folyd can you try running that with different length of array to test the worst case performance? Or make sure the test have the element appeared in the last position assuming it never hits equal.

@pickfire
Copy link
Contributor

I was planning to try quarternary search at first based on https://github.com/scandum/binary_search, let me create some benchmarks and easy way to work test this patch as well.

@ranma42
Copy link
Contributor

ranma42 commented Jul 29, 2020

Quaternary search has been discussed (and rejected) in #74547

@pickfire
Copy link
Contributor

I don't see much difference.

test std_binary_search_l1               ... bench:          58 ns/iter (+/- 1)
test std_binary_search_l1_with_dups     ... bench:          57 ns/iter (+/- 2)
test std_binary_search_l1_worst_case    ... bench:          10 ns/iter (+/- 0)
test std_binary_search_l2               ... bench:          75 ns/iter (+/- 2)
test std_binary_search_l2_with_dups     ... bench:          75 ns/iter (+/- 7)
test std_binary_search_l2_worst_case    ... bench:          15 ns/iter (+/- 0)
test std_binary_search_l3               ... bench:         237 ns/iter (+/- 6)
test std_binary_search_l3_with_dups     ... bench:         238 ns/iter (+/- 6)
test std_binary_search_l3_worst_case    ... bench:          23 ns/iter (+/- 0)
test stdnew_binary_search_l1            ... bench:          58 ns/iter (+/- 4)
test stdnew_binary_search_l1_with_dups  ... bench:          57 ns/iter (+/- 2)
test stdnew_binary_search_l1_worst_case ... bench:          10 ns/iter (+/- 0)
test stdnew_binary_search_l2            ... bench:          75 ns/iter (+/- 3)
test stdnew_binary_search_l2_with_dups  ... bench:          75 ns/iter (+/- 2)
test stdnew_binary_search_l2_worst_case ... bench:          15 ns/iter (+/- 1)
test stdnew_binary_search_l3            ... bench:         238 ns/iter (+/- 9)
test stdnew_binary_search_l3_with_dups  ... bench:         238 ns/iter (+/- 5)
test stdnew_binary_search_l3_worst_case ... bench:          23 ns/iter (+/- 0)
benches/bench.rs
#![feature(test)]
extern crate test;

use test::black_box;
use test::Bencher;

use binary_search::*;

enum Cache {
    L1,
    L2,
    L3,
}

fn std_bench_binary_search<F>(b: &mut Bencher, cache: Cache, mapper: F)
where
    F: Fn(usize) -> usize,
{
    let size = match cache {
        Cache::L1 => 1000,      // 8kb
        Cache::L2 => 10_000,    // 80kb
        Cache::L3 => 1_000_000, // 8Mb
    };
    let v = (0..size).map(&mapper).collect::<Vec<_>>();
    let mut r = 0usize;
    b.iter(move || {
        // LCG constants from https://en.wikipedia.org/wiki/Numerical_Recipes.
        r = r.wrapping_mul(1664525).wrapping_add(1013904223);
        // Lookup the whole range to get 50% hits and 50% misses.
        let i = mapper(r % size);
        black_box(std_binary_search(&v, &i).is_ok());
    })
}

fn std_bench_binary_search_worst_case(b: &mut Bencher, cache: Cache) {
    let size = match cache {
        Cache::L1 => 1000,      // 8kb
        Cache::L2 => 10_000,    // 80kb
        Cache::L3 => 1_000_000, // 8Mb
    };
    let mut v = vec![0; size];
    let i = 1;
    v[size - 1] = i;
    b.iter(move || {
        black_box(std_binary_search(&v, &i).is_ok());
    })
}

#[bench]
fn std_binary_search_l1(b: &mut Bencher) {
    std_bench_binary_search(b, Cache::L1, |i| i * 2);
}

#[bench]
fn std_binary_search_l2(b: &mut Bencher) {
    std_bench_binary_search(b, Cache::L2, |i| i * 2);
}

#[bench]
fn std_binary_search_l3(b: &mut Bencher) {
    std_bench_binary_search(b, Cache::L3, |i| i * 2);
}

#[bench]
fn std_binary_search_l1_with_dups(b: &mut Bencher) {
    std_bench_binary_search(b, Cache::L1, |i| i / 16 * 16);
}

#[bench]
fn std_binary_search_l2_with_dups(b: &mut Bencher) {
    std_bench_binary_search(b, Cache::L2, |i| i / 16 * 16);
}

#[bench]
fn std_binary_search_l3_with_dups(b: &mut Bencher) {
    std_bench_binary_search(b, Cache::L3, |i| i / 16 * 16);
}

#[bench]
fn std_binary_search_l1_worst_case(b: &mut Bencher) {
    std_bench_binary_search_worst_case(b, Cache::L1);
}

#[bench]
fn std_binary_search_l2_worst_case(b: &mut Bencher) {
    std_bench_binary_search_worst_case(b, Cache::L2);
}

#[bench]
fn std_binary_search_l3_worst_case(b: &mut Bencher) {
    std_bench_binary_search_worst_case(b, Cache::L3);
}

fn stdnew_bench_binary_search<F>(b: &mut Bencher, cache: Cache, mapper: F)
where
    F: Fn(usize) -> usize,
{
    let size = match cache {
        Cache::L1 => 1000,      // 8kb
        Cache::L2 => 10_000,    // 80kb
        Cache::L3 => 1_000_000, // 8Mb
    };
    let v = (0..size).map(&mapper).collect::<Vec<_>>();
    let mut r = 0usize;
    b.iter(move || {
        // LCG constants from https://en.wikipedia.org/wiki/Numerical_Recipes.
        r = r.wrapping_mul(1664525).wrapping_add(1013904223);
        // Lookup the whole range to get 50% hits and 50% misses.
        let i = mapper(r % size);
        black_box(stdnew_binary_search(&v, &i).is_ok());
    })
}

fn stdnew_bench_binary_search_worst_case(b: &mut Bencher, cache: Cache) {
    let size = match cache {
        Cache::L1 => 1000,      // 8kb
        Cache::L2 => 10_000,    // 80kb
        Cache::L3 => 1_000_000, // 8Mb
    };
    let mut v = vec![0; size];
    let i = 1;
    v[size - 1] = i;
    b.iter(move || {
        black_box(stdnew_binary_search(&v, &i).is_ok());
    })
}

#[bench]
fn stdnew_binary_search_l1(b: &mut Bencher) {
    stdnew_bench_binary_search(b, Cache::L1, |i| i * 2);
}

#[bench]
fn stdnew_binary_search_l2(b: &mut Bencher) {
    stdnew_bench_binary_search(b, Cache::L2, |i| i * 2);
}

#[bench]
fn stdnew_binary_search_l3(b: &mut Bencher) {
    stdnew_bench_binary_search(b, Cache::L3, |i| i * 2);
}

#[bench]
fn stdnew_binary_search_l1_with_dups(b: &mut Bencher) {
    stdnew_bench_binary_search(b, Cache::L1, |i| i / 16 * 16);
}

#[bench]
fn stdnew_binary_search_l2_with_dups(b: &mut Bencher) {
    stdnew_bench_binary_search(b, Cache::L2, |i| i / 16 * 16);
}

#[bench]
fn stdnew_binary_search_l3_with_dups(b: &mut Bencher) {
    stdnew_bench_binary_search(b, Cache::L3, |i| i / 16 * 16);
}

#[bench]
fn stdnew_binary_search_l1_worst_case(b: &mut Bencher) {
    stdnew_bench_binary_search_worst_case(b, Cache::L1);
}

#[bench]
fn stdnew_binary_search_l2_worst_case(b: &mut Bencher) {
    stdnew_bench_binary_search_worst_case(b, Cache::L2);
}

#[bench]
fn stdnew_binary_search_l3_worst_case(b: &mut Bencher) {
    stdnew_bench_binary_search_worst_case(b, Cache::L3);
}
src/lib.rs
use std::cmp::Ord;
use std::cmp::Ordering::{self, Equal, Greater, Less};

pub fn std_binary_search<T>(s: &[T], x: &T) -> Result<usize, usize>
where
    T: Ord,
{
    std_binary_search_by(s, |p| p.cmp(x))
}

pub fn std_binary_search_by<'a, T, F>(s: &'a [T], mut f: F) -> Result<usize, usize>
where
    F: FnMut(&'a T) -> Ordering,
{
    let mut size = s.len();
    if size == 0 {
        return Err(0);
    }
    let mut base = 0usize;
    while size > 1 {
        let half = size / 2;
        let mid = base + half;
        // mid is always in [0, size), that means mid is >= 0 and < size.
        // mid >= 0: by definition
        // mid < size: mid = size / 2 + size / 4 + size / 8 ...
        let cmp = f(unsafe { s.get_unchecked(mid) });
        base = if cmp == Greater { base } else { mid };
        size -= half;
    }
    // base is always in [0, size) because base <= mid.
    let cmp = f(unsafe { s.get_unchecked(base) });
    if cmp == Equal {
        Ok(base)
    } else {
        Err(base + (cmp == Less) as usize)
    }
}

pub fn stdnew_binary_search<T>(s: &[T], x: &T) -> Result<usize, usize>
where
    T: Ord,
{
    std_binary_search_by(s, |p| p.cmp(x))
}

pub fn stdnew_binary_search_by<'a, T, F>(s: &'a [T], mut f: F) -> Result<usize, usize>
where
    F: FnMut(&'a T) -> Ordering,
{
    let mut size = s.len();
    if size == 0 {
        return Err(0);
    }
    let mut base = 0usize;
    while size > 1 {
        let half = size / 2;
        let mid = base + half;
        // mid is always in [0, size), that means mid is >= 0 and < size.
        // mid >= 0: by definition
        // mid < size: mid = size / 2 + size / 4 + size / 8 ...
        let cmp = f(unsafe { s.get_unchecked(mid) });
        if cmp == Equal {
            return Ok(mid);
        } else if cmp == Less {
            base = mid
        }
        size -= half;
    }
    // base is always in [0, size) because base <= mid.
    let cmp = f(unsafe { s.get_unchecked(base) });
    if cmp == Equal {
        Ok(base)
    } else {
        Err(base + (cmp == Less) as usize)
    }
}

@nbdd0121
Copy link
Contributor

@pickfire you called std_binary_search_by in stdnew_binary_search, it should be stdnew_binary_search_by.

@ranma42
Copy link
Contributor

ranma42 commented Jul 29, 2020

@pickfire it's kind of suspicious that the worst_case benchmarks are actually the fastest ones...
I am not sure about it, but I guess it might be caused by the compiler being able to see through the inputs of the binary search, as they are not black_boxed

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Mar 5, 2021
@bors
Copy link
Contributor

bors commented Mar 5, 2021

⌛ Testing commit 3eb5bee with merge caca212...

@bors
Copy link
Contributor

bors commented Mar 5, 2021

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing caca212 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 5, 2021
@bors bors merged commit caca212 into rust-lang:master Mar 5, 2021
@rustbot rustbot added this to the 1.52.0 milestone Mar 5, 2021
@dpc
Copy link
Contributor

dpc commented May 28, 2021

So it looks like this change broke a rather significant (at least in some circles) user: https://polkadot.network/a-polkadot-postmortem-24-05-2021/

Disclosure: I don't have any association with Polkadot, but I do work in digital assets space other than just being a long time Rust user and enthusiast.

I might be wrong about some detail somewhere, but it seems to me that it was a mistake to create an API with non deterministic result in the first place, and it's not in Rust spirit to create pitfalls like this for its users. Hyrum's Law
in practice. Ideally the function in question should be deterministic (with a strictly defined behavior between Rust version), with a faster, but potentially non-deterministic variant named differently, like binary_search_non_deterministic or something to raise the eyebrow of developer and any reviewers. (similar to sort_unstable)

@VillSnow
Copy link
Contributor

VillSnow commented May 29, 2021

I had thought the order of how an HashMap enumerates entries is same at least when I using same compiler version.
However, that's not true. They changes behavior even in same process due to the security reason, as written in the documentation.

Does HashMap renamed into HashMapNonDeterministic? or add IteratorNonDeterministic trait?
I'm not ironic but pointing out that it's not issue only about binary_search but whole library can be the target.

We have to consider about this if we think their accident was not only their bad and if we want to be more secure language.
Renaming is not the only option. Adding warning is one of the option I think.

@matthieu-m
Copy link
Contributor

Renaming is not the only option. Adding warning is one of the option I think.

I would agree with @VillSnow here.

While the behavior of binary_search is technically documented, it is buried in the middle of the text (bolded, for convenience):

Binary searches this sorted slice for a given element.

If the value is found then Result::Ok is returned, containing the index of the matching element. If there are multiple matches, then any one of the matches could be returned. If the value is not found then Result::Err is returned, containing the index where a matching element could be inserted while maintaining sorted order.

See also binary_search_by, binary_search_by_key, and partition_point.

And while related functions are highlighted, it is not immediately clear that partition_point will provide a stable output.

Rewording the documentation of binary_search to more explicitly call out the instability and the alternative that partition_point is may help in avoiding similar incidents in the future while at the same time having a much lesser cost than renaming & deprecating.

@the8472
Copy link
Member

the8472 commented May 29, 2021

To be blunt, the issue seems to be making assumptions about function behavior without reading the documentation, forgetting about that critical part later or not realizing that it's relevant to one's application or something along that line.
Adding more documentation might not help with that problem.

Also hashmap and binary search aren't quite comparable. Hashmap is intentionally randomized, which means anything relying on iteration order will be revealed very quickly. The order of binary_search on the other hand is stable within a revision of the standard library, so it takes longer to uncover accidental reliance. That is unfortunate but necessary to provide some flexibility for the implementation to evolve.

Adding some jitter to the halving step to introduce runtime randomness might be possible but that would probably impact performance or at least make benchmarks noisier and also break people who rely on ordering being stable within a compiler version (well, they shouldn't do that either...)

@pickfire
Copy link
Contributor

Adding some jitter to the halving step to introduce runtime randomness might be possible but that would probably impact performance or at least make benchmarks noisier and also break people who rely on ordering being stable within a compiler version (well, they shouldn't do that either...)

That could be done like only for debug mode.

@the8472
Copy link
Member

the8472 commented May 29, 2021

std normally isn't recompiled, so enabling debug assertions in a downstream crate won't enable them in std and cargo's build-std is still unstable.

@dpc
Copy link
Contributor

dpc commented May 29, 2021

BTW. One redditor brought a really great point that binary search that does not commit to first/last element is of very limited use. It pretty much only good for existence check.

So I guess rust-lang/rfcs#2184 is much more important now.

@VillSnow
Copy link
Contributor

I meant 'waning' which is raised by compiler and have to suppressed by #allow(non_deterministic)


BTW. One redditor brought a really great point that binary search that does not commit to first/last element is of very limited use. It pretty much only good for existence check.

So I guess rust-lang/rfcs#2184 is much more important now.

Interesting point. I came up with making a sorted list with duplicated key, but it might be niche.
In addition if a user really needs such function, binary search is not so heavy to implement by himself.

@m-ou-se
Copy link
Member

m-ou-se commented May 31, 2021

Note that the function is not nondeterministic. If you run it multiple times with the same data, it returns the exact same result. It doesn't follow multiple (or randomly selected) paths, etc. But the behaviour can change between different versions of the standard library. That's not nondeterminism.

binary search that does not commit to first/last element is of very limited use

In most cases you already know the element only occurs once in the slice, in which case you can be sure that that is the exact element it will find. That's the main reason for this: that the algorithm can stop when it finds the element, without having to continue searching to check if there's more identical elements.

@VillSnow
Copy link
Contributor

VillSnow commented May 31, 2021

In most cases you already know the element only occurs once in the slice, in which case you can be sure that that is the exact element it will find. That's the main reason for this: that the algorithm can stop when it finds the element, without having to continue searching to check if there's more identical elements.

That's case, I was wrong.

Find 2 from 0..=9

Get the first element

0 1 2 3 4 5 6 7 8 9
L M R
0 1 2 3 4 5 6 7 8 9
L M R
0 1 2 3 4 5 6 7 8 9
L M R
0 1 2 3 4 5 6 7 8 9
L,R

Takes 3 steps

Early return

0 1 2 3 4 5 6 7 8 9
L M R
0 1 2 3 4 5 6 7 8 9
L M R

Takes only 2 steps

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Jun 14, 2021
…=JohnTitor

Integrate binary search codes of binary_search_by and partition_point

For now partition_point has own binary search code piece.
It is because binary_search_by had called the comparer more times and the author (=me) wanted to avoid it.

However, now binary_search_by uses the comparer minimum times. (rust-lang#74024)
So it's time to integrate them.

The appearance of the codes are a bit different but both use completely same logic.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 15, 2021
…ohnTitor

Integrate binary search codes of binary_search_by and partition_point

For now partition_point has own binary search code piece.
It is because binary_search_by had called the comparer more times and the author (=me) wanted to avoid it.

However, now binary_search_by uses the comparer minimum times. (rust-lang#74024)
So it's time to integrate them.

The appearance of the codes are a bit different but both use completely same logic.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 17, 2021
…ochenkov

Prefer `partition_point` to look up assoc items

Since we now have `partition_point` (instead of `equal_range`), I think it's worth trying to use it instead of manually finding it.
`partition_point` uses `binary_search_by` internally (rust-lang#85406) and its performance has been improved (rust-lang#74024), so I guess this will make a performance difference.
@Amanieu
Copy link
Member

Amanieu commented Jul 26, 2024

In #128254 I actually revert to the old algorithm which turns out to be much faster once you ensure that LLVM doesn't turn CMOV into branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.