Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize array::IntoIter #100214

Merged
merged 1 commit into from
Sep 21, 2022
Merged

Optimize array::IntoIter #100214

merged 1 commit into from
Sep 21, 2022

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Aug 6, 2022

.into_iter() on arrays was slower than it needed to be (especially compared to slice iterator) since it uses Range<usize>, which needs to handle degenerate ranges like 10..4.

This PR adds an internal IndexRange type that's like Range<usize> but with a safety invariant that means it doesn't need to worry about those cases -- it only handles start <= end -- and thus can give LLVM more information to optimize better.

I added one simple demonstration of the improvement as a codegen test.

(vec::IntoIter uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed. array::IntoIter can't use pointers because that would keep it from being movable.)

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 6, 2022
@rustbot

This comment was marked as resolved.

@rust-highfive
Copy link
Collaborator

r? @thomcc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 6, 2022
@timvermeulen
Copy link
Contributor

This regresses the vec::bench_flat_map_collect benchmark by a lot for me, over 10x.

When looking into this I noticed that array::IntoIter::fold calls fold on iter::ByRefSized(&mut self.alive), which as far as I can see doesn't call the actual fold implementation of IndexRange because ByRefSized would have to pass the iterator by value, which it can't. So using ByRefSized seems to be pointless here. However, it doesn't look like this is what's causing the slowdown.

@scottmcm
Copy link
Member Author

scottmcm commented Aug 7, 2022

Thanks, @timvermeulen, I'll look at that one.

I've sent PR #100220 to fix ByRefSized::fold, and will consider this PR blocked until that one goes in and I can confirm it solves the problem you raised.

@rustbot author

Update 2022-08-24: I've rebased atop the other PR, but still not investigated that perf test, so it's not really review-ready yet.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 7, 2022
@scottmcm scottmcm added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Aug 23, 2022
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Aug 23, 2022
…riplett

Properly forward `ByRefSized::fold` to the inner iterator

cc `@timvermeulen,` who noticed this mistake in rust-lang#100214 (comment)
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Aug 24, 2022
…riplett

Properly forward `ByRefSized::fold` to the inner iterator

cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 24, 2022
…riplett

Properly forward `ByRefSized::fold` to the inner iterator

cc ```@timvermeulen,``` who noticed this mistake in rust-lang#100214 (comment)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 24, 2022
…riplett

Properly forward `ByRefSized::fold` to the inner iterator

cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)
@scottmcm scottmcm removed the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Aug 24, 2022
@scottmcm scottmcm force-pushed the strict-range branch 2 times, most recently from 84f0eaf to 14e0e7b Compare August 25, 2022 00:55
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Properly forward `ByRefSized::fold` to the inner iterator

cc ``@timvermeulen,`` who noticed this mistake in rust-lang/rust#100214 (comment)
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the strict-range branch 2 times, most recently from 4104932 to c43b960 Compare September 18, 2022 03:07
@scottmcm
Copy link
Member Author

@timvermeulen My mistake turned out to be an obvious one: I'd forgotten to inline some methods, and because it's a non-generic type that meant the benches ended up needing to call it instead of inline it, with the obvious terrible consequences.

I now get exactly the same inner loop for that .flat_map(|color| color.rotate_left(8).to_be_bytes()) bench as without this PR:

.LBB0_5:
	mov	edx, dword ptr [rbx + rcx]
	rol	edx, 8
	bswap	edx
	mov	dword ptr [rax + rcx], edx
	add	rcx, 4
	cmp	rdi, rcx
	jne	.LBB0_5

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2022
@thomcc
Copy link
Member

thomcc commented Sep 18, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@thomcc
Copy link
Member

thomcc commented Sep 19, 2022

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2022
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`.

This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better.

I added one simple demonstration of the improvement as a codegen test.
@scottmcm
Copy link
Member Author

Oh, it's failing on the UB trap check. I've ignored the codegen test for debug, since it's a -O test.

@bors r=thomcc

@bors
Copy link
Contributor

bors commented Sep 20, 2022

📌 Commit 6dbd9a2 has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 20, 2022
@bors
Copy link
Contributor

bors commented Sep 20, 2022

⌛ Testing commit 6dbd9a2 with merge da70c0543ba1f77142453c5c82e0a1c1e4e327d3...

@oli-obk
Copy link
Contributor

oli-obk commented Sep 20, 2022

@bors retry (cycling a higher priority PR)

#102028

@bors
Copy link
Contributor

bors commented Sep 20, 2022

⌛ Testing commit 6dbd9a2 with merge ebbb1c6a0f71f7a95483d2fef3bed8c804b21cbe...

@bors
Copy link
Contributor

bors commented Sep 20, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 20, 2022
@scottmcm
Copy link
Member Author

Looks like the retry didn't put it back in the queue right? I'll try it again.

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2022
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Sep 21, 2022

⌛ Testing commit 6dbd9a2 with merge 4ecfdfa...

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Sep 21, 2022

☀️ Test successful - checks-actions
Approved by: thomcc
Pushing 4ecfdfa to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 21, 2022
@bors bors merged commit 4ecfdfa into rust-lang:master Sep 21, 2022
@rustbot rustbot added this to the 1.66.0 milestone Sep 21, 2022
@scottmcm scottmcm deleted the strict-range branch September 21, 2022 04:05
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4ecfdfa): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-0.4%, -0.4%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.7% [0.9%, 2.5%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.0% [-2.0%, -2.0%] 1
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) -2.0% [-2.0%, -2.0%] 1

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@scottmcm
Copy link
Member Author

Huh, interesting that the post-merge perf run is so different (thankfully better!) from the original: #100214 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants