-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize array::IntoIter
#100214
Optimize array::IntoIter
#100214
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
r? @thomcc (rust-highfive has picked a reviewer for you, use r? to override) |
This regresses the When looking into this I noticed that |
Thanks, @timvermeulen, I'll look at that one. I've sent PR #100220 to fix @rustbot author Update 2022-08-24: I've rebased atop the other PR, but still not investigated that perf test, so it's not really review-ready yet. |
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc `@timvermeulen,` who noticed this mistake in rust-lang#100214 (comment)
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ```@timvermeulen,``` who noticed this mistake in rust-lang#100214 (comment)
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)
84f0eaf
to
14e0e7b
Compare
Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang/rust#100214 (comment)
14e0e7b
to
c08ec39
Compare
This comment has been minimized.
This comment has been minimized.
4104932
to
c43b960
Compare
@timvermeulen My mistake turned out to be an obvious one: I'd forgotten to inline some methods, and because it's a non-generic type that meant the benches ended up needing to I now get exactly the same inner loop for that .LBB0_5:
mov edx, dword ptr [rbx + rcx]
rol edx, 8
bswap edx
mov dword ptr [rax + rcx], edx
add rcx, 4
cmp rdi, rcx
jne .LBB0_5 @rustbot ready |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
@rustbot author |
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`. This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better. I added one simple demonstration of the improvement as a codegen test.
fb3a7d8
to
6dbd9a2
Compare
Oh, it's failing on the UB trap check. I've ignored the codegen test for debug, since it's a @bors r=thomcc |
⌛ Testing commit 6dbd9a2 with merge da70c0543ba1f77142453c5c82e0a1c1e4e327d3... |
⌛ Testing commit 6dbd9a2 with merge ebbb1c6a0f71f7a95483d2fef3bed8c804b21cbe... |
💔 Test failed - checks-actions |
Looks like the retry didn't put it back in the queue right? I'll try it again. @bors retry |
☀️ Test successful - checks-actions |
Finished benchmarking commit (4ecfdfa): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
Huh, interesting that the post-merge perf run is so different (thankfully better!) from the original: #100214 (comment) |
.into_iter()
on arrays was slower than it needed to be (especially compared to slice iterator) since it usesRange<usize>
, which needs to handle degenerate ranges like10..4
.This PR adds an internal
IndexRange
type that's likeRange<usize>
but with a safety invariant that means it doesn't need to worry about those cases -- it only handlesstart <= end
-- and thus can give LLVM more information to optimize better.I added one simple demonstration of the improvement as a codegen test.
(
vec::IntoIter
uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed.array::IntoIter
can't use pointers because that would keep it from being movable.)