-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boost iterator intersperse(_with) performance #111379
Conversation
r? @cuviper (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
This comment has been minimized.
This comment has been minimized.
7644bdb
to
f763004
Compare
This comment has been minimized.
This comment has been minimized.
f763004
to
48ab2a3
Compare
Per @m-ou-se suggestion, I implemented an additional bool that tracks if the iteration has started or not, and that resulted in a significant performance degradation. Now the gains are much more modest: tests with 222 in the graph below are only a fraction of the total possible perf improvements. I guess the main question is if it is ok for a wrapping iterator to call In this graph, results of the benchmarks -- first line is the current implementation (000), vs optimized (111 - calls next on construction), vs delayed (222 - calls next on first call to next) |
I think a third-party crate could make the tradeoff to have that first call right away, and clearly explain that to the user, but |
Ok, a smaller perf improvement is still better than no improvement. Adjusted |
This comment has been minimized.
This comment has been minimized.
Sorry for my absence -- I'm digging this out of my review backlog now. On my AMD Ryzen 7 5800X with the latest nightly, the "222" implementation actually looks great -- beating "111" on everything except (I left the fancy red grouping outlines as an exercise for the reader... 🙂 ) |
thx @cuviper, I just re-ran and also get similar results. So it seems the 222 path is an ok as it clearly gains in several cases, and remains consistent with the other ones. Are there any objections to implement this PR as the 222 variant? |
I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of the https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13 discussion, and as a result I optimized them a bit, without relying on the peekable iterator.
e9e72a5
to
f1dbc7b
Compare
#[unstable(feature = "iter_intersperse", reason = "recently added", issue = "79524")] | ||
impl<I> FusedIterator for Intersperse<I> | ||
where | ||
I: FusedIterator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note -- with iter: Fuse<I>
, we shouldn't need this constraint, but it's conservative to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, so i left it as is (unless it gets in the way of anything?)
hi.map(|hi| { | ||
hi.saturating_sub(!started as usize) | ||
.saturating_add(next_is_some as usize) | ||
.saturating_add(hi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this saturating instead of and_then
-checked_add
as before? It should return None
when the upper bound is greater than usize::MAX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should be more tests around hints - esp when working with iters that have underlying iters... seems like this is something that can relatively easily can be messed up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe so -- we'd certainly like for the standard library to get it right. But Iterator::size_hint
has limited utility anyway, given that it can be implemented incorrectly by safe code for custom iterators.
Related discussion: https://internals.rust-lang.org/t/is-size-hint-1-ever-used/8187
Co-authored-by: Josh Stone <cuviper@gmail.com>
Thanks! @bors r+ |
…iper Boost iterator intersperse(_with) performance I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator. See also [full benchmark repo](https://github.com/nyurik/intersperse_perf) Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2): ![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)
…iper Boost iterator intersperse(_with) performance I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator. See also [full benchmark repo](https://github.com/nyurik/intersperse_perf) Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2): ![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)
Rollup of 9 pull requests Successful merges: - rust-lang#111379 (Boost iterator intersperse(_with) performance) - rust-lang#118182 (Properly recover from trailing attr in body) - rust-lang#119641 (Remove feature not required by `Ipv6Addr::to_cononical` doctest) - rust-lang#119759 (Add FileCheck annotations to dataflow-const-prop tests) - rust-lang#120275 (Avoid ICE in trait without `dyn` lint) - rust-lang#120376 (Update codegen test for LLVM 18) - rust-lang#120386 (ScopeTree: remove destruction_scopes as unused) - rust-lang#120398 (Improve handling of numbers in `IntoDiagnosticArg`) - rust-lang#120399 (Remove myself from review rotation) r? `@ghost` `@rustbot` modify labels: rollup
…iper Boost iterator intersperse(_with) performance I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator. See also [full benchmark repo](https://github.com/nyurik/intersperse_perf) Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2): ![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)
Rollup of 9 pull requests Successful merges: - rust-lang#111379 (Boost iterator intersperse(_with) performance) - rust-lang#118182 (Properly recover from trailing attr in body) - rust-lang#119641 (Remove feature not required by `Ipv6Addr::to_cononical` doctest) - rust-lang#119957 (fix: correct suggestion arg for impl trait) - rust-lang#120275 (Avoid ICE in trait without `dyn` lint) - rust-lang#120376 (Update codegen test for LLVM 18) - rust-lang#120386 (ScopeTree: remove destruction_scopes as unused) - rust-lang#120398 (Improve handling of numbers in `IntoDiagnosticArg`) - rust-lang#120399 (Remove myself from review rotation) r? `@ghost` `@rustbot` modify labels: rollup
☀️ Test successful - checks-actions |
Finished benchmarking commit (8b6a431): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 660.806s -> 663.547s (0.41%) |
I did some benchmark digging into the
intersperse
andintersperse_with
code as part of this discussion, and as a result I optimized them a bit, without relying on the peekable iterator.See also full benchmark repo
Benchmarks show near 2x performance improvements with the simple
sum
benchmarks: