Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTreeSet intersection, is_subset & difference optimizations #64820

Merged
merged 1 commit into from
Oct 2, 2019

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Sep 26, 2019

...based on the range of values contained; in particular, a massive improvement when these ranges are disjoint (or merely touching), like in the neg-vs-pos benchmarks already in liballoc. Inspired by #64383 but none of the ideas there worked out.

I introduced another variant in IntersectionInner and in DifferenceInner, because I couldn't find a way to initialize these iterators as empty if there's no empty set around.

Also, reduced the size of "large" sets in test cases - if Miri can't handle it, it was needlessly slowing down everyone.

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 26, 2019
@Centril
Copy link
Contributor

Centril commented Sep 26, 2019

cc @scottmcm @bluss

Copy link
Contributor Author

@ssomers ssomers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile I tweaked the order in both match expressions to move first/min before last/max

@ssomers
Copy link
Contributor Author

ssomers commented Sep 30, 2019

Property based tests and performance comparison by travis are now cleaned up and as complete as I can think off.

let mut other_iter = other.iter();
let other_min = other_iter.next().unwrap();
let other_max = other_iter.next_back().unwrap();
let mut self_iter = match (self_min.cmp(other_min), self_max.cmp(other_max)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous method you use the Ord::cmp(x, y) style and here x.cmp(y). Either is fine but consistency is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never noticed that. Let's count: we have 4 x Ord::cmp, 3 x cmp (counting pairs as one). Before I started messing about in this code, there was just 1 Ord::cmp and 2 cmp. Notice that cmp_opt acts as a replacement for Ord::cmp but uses cmp itself.

So I say, use the shorter, member cmp.

@bluss
Copy link
Member

bluss commented Sep 30, 2019

Nice! Cool benchmark setup. I only had nitpicks to contribute to the review. Would love if there was a way to write this without .unwrap() (using discriminants for control flow instead), but it is clear enough that they can never panic here. r=me when nitpicks are fixed to taste

@Centril
Copy link
Contributor

Centril commented Sep 30, 2019

Property based tests and performance comparison by travis are now cleaned up and as complete as I can think off.

Oh nice! -- Could we add the proptests to the test suite? cc @alexcrichton @nikomatsakis

@ssomers
Copy link
Contributor Author

ssomers commented Oct 1, 2019

write this without .unwrap

I tried several times, but always hit unsavory amounts of indentation, remote else clauses or eRFC 2497. But now I think I saw the light, resulting in a little less code that is more readable (mostly by dropping some of the micro-optimization). Peculiar indentation courtesy of cargo fmt.

r=bluss

{
(other_min, other_max)
} else {
return false; // other is empty
Copy link
Contributor Author

@ssomers ssomers Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else-part cannot be reached, due to the performance shortcut on top. It's possible to:

  • merge this let if with the if let above, but then it's not at all clear to the casual reader that it should return true
  • write a panic! explaining this, better than raw unwrap I guess, but pointless extra code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable!("message") is the panic for that, but since we don't need a panic - false is correct, it seems this works just as well.

@ssomers
Copy link
Contributor Author

ssomers commented Oct 1, 2019

Could we add the proptests to the test suite

I don't know what test suites there are, but seeing if cfg!(miri) { // Miri is too slow appear in the unit tests tells me not everyone would welcome proptests in the standard test suite. I could easily write a bunch of small unit tests covering every corner, but not in the current scheme with 1 test function testing every kind of intersection in 1 file covering everything about sets.

@bluss bluss changed the title BTreeSet intersection, is_subnet & difference optimizations BTreeSet intersection, is_subset & difference optimizations Oct 1, 2019
@bluss
Copy link
Member

bluss commented Oct 1, 2019

@bors r+ rollup

Thanks!

@bors
Copy link
Contributor

bors commented Oct 1, 2019

📌 Commit d132a70 has been approved by bluss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 1, 2019
tmandry added a commit to tmandry/rust that referenced this pull request Oct 1, 2019
BTreeSet intersection, is_subset & difference optimizations

...based on the range of values contained; in particular, a massive improvement when these ranges are disjoint (or merely touching), like in the neg-vs-pos benchmarks already in liballoc. Inspired by rust-lang#64383 but none of the ideas there worked out.

I introduced another variant in IntersectionInner and in DifferenceInner, because I couldn't find a way to initialize these iterators as empty if there's no empty set around.

Also, reduced the size of "large" sets in test cases - if Miri can't handle it, it was needlessly slowing down everyone.
Centril added a commit to Centril/rust that referenced this pull request Oct 1, 2019
BTreeSet intersection, is_subset & difference optimizations

...based on the range of values contained; in particular, a massive improvement when these ranges are disjoint (or merely touching), like in the neg-vs-pos benchmarks already in liballoc. Inspired by rust-lang#64383 but none of the ideas there worked out.

I introduced another variant in IntersectionInner and in DifferenceInner, because I couldn't find a way to initialize these iterators as empty if there's no empty set around.

Also, reduced the size of "large" sets in test cases - if Miri can't handle it, it was needlessly slowing down everyone.
Centril added a commit to Centril/rust that referenced this pull request Oct 1, 2019
BTreeSet intersection, is_subset & difference optimizations

...based on the range of values contained; in particular, a massive improvement when these ranges are disjoint (or merely touching), like in the neg-vs-pos benchmarks already in liballoc. Inspired by rust-lang#64383 but none of the ideas there worked out.

I introduced another variant in IntersectionInner and in DifferenceInner, because I couldn't find a way to initialize these iterators as empty if there's no empty set around.

Also, reduced the size of "large" sets in test cases - if Miri can't handle it, it was needlessly slowing down everyone.
bors added a commit that referenced this pull request Oct 1, 2019
Rollup of 7 pull requests

Successful merges:

 - #63416 (apfloat: improve doc comments)
 - #64820 (BTreeSet intersection, is_subset & difference optimizations)
 - #64910 (syntax: cleanup param, method, and misc parsing)
 - #64912 (Remove unneeded `fn main` blocks from docs)
 - #64933 (Fixes #64919. Suggest fix based on operator precendence.)
 - #64943 (Add lower bound doctests for `saturating_{add,sub}` signed ints)
 - #64950 (Simplify interners)

Failed merges:

r? @ghost
@bors bors merged commit d132a70 into rust-lang:master Oct 2, 2019
Centril added a commit to Centril/rust that referenced this pull request Oct 19, 2019
BTreeSet symmetric_difference & union optimized

No scalability changes, but:
- Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union.
- Compact the code of rust-lang#64820 by moving if/else into match guards.

r? @bluss
Centril added a commit to Centril/rust that referenced this pull request Oct 19, 2019
BTreeSet symmetric_difference & union optimized

No scalability changes, but:
- Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union.
- Compact the code of rust-lang#64820 by moving if/else into match guards.

r? @bluss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants