Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost iterator intersperse(_with) performance #111379

Merged
merged 5 commits into from
Jan 27, 2024

Conversation

nyurik
Copy link
Contributor

@nyurik nyurik commented May 9, 2023

I did some benchmark digging into the intersperse and intersperse_with code as part of this discussion, and as a result I optimized them a bit, without relying on the peekable iterator.

See also full benchmark repo

Benchmarks show near 2x performance improvements with the simple sum benchmarks:
image

@rustbot
Copy link
Collaborator

rustbot commented May 9, 2023

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 9, 2023
@rustbot
Copy link
Collaborator

rustbot commented May 9, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@nyurik
Copy link
Contributor Author

nyurik commented May 9, 2023

Per @m-ou-se suggestion, I implemented an additional bool that tracks if the iteration has started or not, and that resulted in a significant performance degradation. Now the gains are much more modest: tests with 222 in the graph below are only a fraction of the total possible perf improvements.

I guess the main question is if it is ok for a wrapping iterator to call nested_iterator.next() on construction rather than on the first call to next() in order to gain a significant performance boost? Or should the almost never used edge cases warrant a significant perf degradation for everyone else?

In this graph, results of the benchmarks -- first line is the current implementation (000), vs optimized (111 - calls next on construction), vs delayed (222 - calls next on first call to next)

image

@cuviper
Copy link
Member

cuviper commented May 9, 2023

I guess the main question is if it is ok for a wrapping iterator to call nested_iterator.next() on construction rather than on the first call to next() in order to gain a significant performance boost? Or should the almost never used edge cases warrant a significant perf degradation for everyone else?

I think a third-party crate could make the tradeoff to have that first call right away, and clearly explain that to the user, but Iterator laziness is an important principle in the standard library.

@nyurik
Copy link
Contributor Author

nyurik commented May 9, 2023

Ok, a smaller perf improvement is still better than no improvement. Adjusted

@rust-log-analyzer

This comment has been minimized.

@cuviper
Copy link
Member

cuviper commented Jan 25, 2024

Sorry for my absence -- I'm digging this out of my review backlog now.

On my AMD Ryzen 7 5800X with the latest nightly, the "222" implementation actually looks great -- beating "111" on everything except opt-unwrap and with-opt-unwrap, and still much better than "000". Maybe LLVM got better at optimizing that branch since you ran it? Or it could just be CPU differences.

violin

(I left the fancy red grouping outlines as an exercise for the reader... 🙂 )

@nyurik
Copy link
Contributor Author

nyurik commented Jan 25, 2024

thx @cuviper, I just re-ran and also get similar results. So it seems the 222 path is an ok as it clearly gains in several cases, and remains consistent with the other ones. Are there any objections to implement this PR as the 222 variant?

I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of the https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13 discussion, and as a result I optimized them a bit, without relying on the peekable iterator.
#[unstable(feature = "iter_intersperse", reason = "recently added", issue = "79524")]
impl<I> FusedIterator for Intersperse<I>
where
I: FusedIterator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note -- with iter: Fuse<I>, we shouldn't need this constraint, but it's conservative to keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, so i left it as is (unless it gets in the way of anything?)

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved
hi.map(|hi| {
hi.saturating_sub(!started as usize)
.saturating_add(next_is_some as usize)
.saturating_add(hi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this saturating instead of and_then-checked_add as before? It should return None when the upper bound is greater than usize::MAX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be more tests around hints - esp when working with iters that have underlying iters... seems like this is something that can relatively easily can be messed up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe so -- we'd certainly like for the standard library to get it right. But Iterator::size_hint has limited utility anyway, given that it can be implemented incorrectly by safe code for custom iterators.

Related discussion: https://internals.rust-lang.org/t/is-size-hint-1-ever-used/8187

@cuviper
Copy link
Member

cuviper commented Jan 26, 2024

Thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Jan 26, 2024

📌 Commit 77f31ef has been approved by cuviper

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2024
Nadrieril added a commit to Nadrieril/rust that referenced this pull request Jan 27, 2024
…iper

Boost iterator intersperse(_with) performance

I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator.

See also [full benchmark repo](https://github.com/nyurik/intersperse_perf)

Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2):
![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)
Nadrieril added a commit to Nadrieril/rust that referenced this pull request Jan 27, 2024
…iper

Boost iterator intersperse(_with) performance

I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator.

See also [full benchmark repo](https://github.com/nyurik/intersperse_perf)

Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2):
![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 27, 2024
Rollup of 9 pull requests

Successful merges:

 - rust-lang#111379 (Boost iterator intersperse(_with) performance)
 - rust-lang#118182 (Properly recover from trailing attr in body)
 - rust-lang#119641 (Remove feature not required by `Ipv6Addr::to_cononical` doctest)
 - rust-lang#119759 (Add FileCheck annotations to dataflow-const-prop tests)
 - rust-lang#120275 (Avoid ICE in trait without `dyn` lint)
 - rust-lang#120376 (Update codegen test for LLVM 18)
 - rust-lang#120386 (ScopeTree: remove destruction_scopes as unused)
 - rust-lang#120398 (Improve handling of numbers in `IntoDiagnosticArg`)
 - rust-lang#120399 (Remove myself from review rotation)

r? `@ghost`
`@rustbot` modify labels: rollup
Nadrieril added a commit to Nadrieril/rust that referenced this pull request Jan 27, 2024
…iper

Boost iterator intersperse(_with) performance

I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator.

See also [full benchmark repo](https://github.com/nyurik/intersperse_perf)

Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2):
![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 27, 2024
Rollup of 9 pull requests

Successful merges:

 - rust-lang#111379 (Boost iterator intersperse(_with) performance)
 - rust-lang#118182 (Properly recover from trailing attr in body)
 - rust-lang#119641 (Remove feature not required by `Ipv6Addr::to_cononical` doctest)
 - rust-lang#119957 (fix: correct suggestion arg for impl trait)
 - rust-lang#120275 (Avoid ICE in trait without `dyn` lint)
 - rust-lang#120376 (Update codegen test for LLVM 18)
 - rust-lang#120386 (ScopeTree: remove destruction_scopes as unused)
 - rust-lang#120398 (Improve handling of numbers in `IntoDiagnosticArg`)
 - rust-lang#120399 (Remove myself from review rotation)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors
Copy link
Contributor

bors commented Jan 27, 2024

⌛ Testing commit 77f31ef with merge 8b6a431...

@bors
Copy link
Contributor

bors commented Jan 27, 2024

☀️ Test successful - checks-actions
Approved by: cuviper
Pushing 8b6a431 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 27, 2024
@bors bors merged commit 8b6a431 into rust-lang:master Jan 27, 2024
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 27, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8b6a431): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.8% [0.8%, 0.8%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.2% [1.2%, 1.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 660.806s -> 663.547s (0.41%)
Artifact size: 308.11 MiB -> 308.14 MiB (0.01%)

@nyurik nyurik deleted the intersperse-speed-up branch February 12, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants