Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove my scalar_copy_backend_type optimization attempt #123185

Merged
merged 3 commits into from
Apr 10, 2024

Conversation

scottmcm
Copy link
Member

I added this back in #111999 , but I no longer think it's a good idea

  • It had to get scaled back to only power-of-two things to not break a bunch of targets
  • LLVM seems to be getting better at memcpy removal anyway
  • Introducing vector instructions has seemed to sometimes (optimize zipping over array iterators #115515 (comment)) make autovectorization worse

So this removes it from the codegen crates entirely, and instead just tries to use https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy instead of direct memcpy so things will still use load/store when a type isn't OperandValue::Ref.

@rustbot
Copy link
Collaborator

rustbot commented Mar 29, 2024

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 29, 2024
// CHECK: %[[TEMP:.+]] = load <2 x i8>, ptr %a, align 1
// CHECK: store <2 x i8> %[[TEMP]], ptr %p, align 1
// CHECK: %[[TEMP:.+]] = load i16, ptr %a, align 1
// CHECK: store i16 %[[TEMP]], ptr %p, align 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that while we now generate an alloca and memcpys for this (as seen in the array-codegen file), LLVM is able to remove them.

If LLVM picks i16 for this (and not <2 x i8>), then great, let's do that.

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 29, 2024
@bors
Copy link
Contributor

bors commented Mar 29, 2024

⌛ Trying commit b72e5ad with merge ab08738...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 29, 2024
Remove my `scalar_copy_backend_type` optimization attempt

I added this back in rust-lang#111999 , but I no longer think it's a good idea
- It had to get scaled back to only power-of-two things to not break a bunch of targets
- LLVM seems to be getting better at memcpy removal anyway
- Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse

So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
@bors
Copy link
Contributor

bors commented Mar 29, 2024

☀️ Try build successful - checks-actions
Build commit: ab08738 (ab08738b5d1c784475d9fd734165d671e5617689)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ab08738): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.1% [1.1%, 1.1%] 1
Improvements ✅
(primary)
-0.6% [-0.7%, -0.6%] 4
Improvements ✅
(secondary)
-1.4% [-1.4%, -1.4%] 1
All ❌✅ (primary) -0.6% [-0.7%, -0.6%] 4

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.7%, -1.4%] 4
All ❌✅ (primary) - - 0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.0%] 13
Improvements ✅
(secondary)
-0.9% [-1.3%, -0.0%] 36
All ❌✅ (primary) -0.1% [-0.1%, -0.0%] 13

Bootstrap: 667.865s -> 668.349s (0.07%)
Artifact size: 315.66 MiB -> 315.57 MiB (-0.03%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 29, 2024
@scottmcm
Copy link
Member Author

One small secondary regression; everything else an improvement -- even the runtime benchmarks improved.
@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Mar 29, 2024
@@ -419,7 +418,14 @@ impl<'a, 'tcx, V: CodegenObject> OperandValue<V> {
bx.store_with_flags(val, dest.llval, dest.align, flags);
return;
}
base::memcpy_ty(bx, dest.llval, dest.align, r, source_align, dest.layout, flags)
bx.memcpy_known_size(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 why isn't this also just using typed_place_copy?

also, memcpy_known_size has one callsite (this one) -- you should inline it. i don't really see the value of exposing it if it's only being used once, seems to add more confusion to the api imo.

Copy link
Member Author

@scottmcm scottmcm Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, that's a really good observation. I thought the answer was that it would be recursion -- that typed_place_copy ends up here -- but now that I look it doesn't, so I'll make that change 👍

And yup, I'll inline memcpy_known_size too. Looks like I forgot to think it if was useful after I undid a couple of other changes that didn't work out.

@scottmcm scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 29, 2024
@scottmcm scottmcm force-pushed the more-typed-copy branch 2 times, most recently from 3e5d267 to ac20c35 Compare March 29, 2024 20:04
@scottmcm
Copy link
Member Author

Well MemFlags made that more complicated than expected, but done. Here's the diff since your last review, CE: https://github.com/rust-lang/rust/compare/b72e5ad9062f1192dfc60645050ed04836cd2cd9..ac20c35d5562dce0530ac2972268bc807ec41190

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 29, 2024
@fee1-dead
Copy link
Member

r? compiler

@rustbot rustbot assigned cjgillot and unassigned fee1-dead Mar 30, 2024
@compiler-errors
Copy link
Member

I already started this review anyways

r? compiler-errors

@bors
Copy link
Contributor

bors commented Apr 9, 2024

📌 Commit 556b47e has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 9, 2024
Copy link
Member

@DianQK DianQK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM seems to be getting better at memcpy removal anyway.

I think we can expect to see more significant changes after llvm/llvm-project#87190.

// OPT3LINX64-NEXT: store <8 x i16>
// OPT3WINX64: load <8 x i16>
// OPT3WINX64-NEXT: call <8 x i16> @llvm.bswap
// OPT3WINX64-NEXT: store <8 x i16>
// CHECK-NEXT: ret void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I hadn't noticed that different optimization levels yield different optimization effects.
I believe it makes sense to differentiate between O2 and O3 here.

The changes to O2 seem fine. I'll check again on this later.

@bors
Copy link
Contributor

bors commented Apr 10, 2024

⌛ Testing commit 556b47e with merge ba787ee...

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 10, 2024
…-errors

Remove my `scalar_copy_backend_type` optimization attempt

I added this back in rust-lang#111999 , but I no longer think it's a good idea
- It had to get scaled back to only power-of-two things to not break a bunch of targets
- LLVM seems to be getting better at memcpy removal anyway
- Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse

So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 10, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 10, 2024
@scottmcm
Copy link
Member Author

scottmcm commented Apr 10, 2024

I got the compile-test directives in the test wrong 🤦 Filed #123730

@bors r=compiler-errors

@bors
Copy link
Contributor

bors commented Apr 10, 2024

📌 Commit 593e900 has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 10, 2024
@bors
Copy link
Contributor

bors commented Apr 10, 2024

⌛ Testing commit 593e900 with merge c2239bc...

@bors
Copy link
Contributor

bors commented Apr 10, 2024

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing c2239bc to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 10, 2024
@bors bors merged commit c2239bc into rust-lang:master Apr 10, 2024
12 checks passed
@rustbot rustbot added this to the 1.79.0 milestone Apr 10, 2024
@scottmcm scottmcm deleted the more-typed-copy branch April 10, 2024 19:10
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c2239bc): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-0.7%, -0.7%] 4
Improvements ✅
(secondary)
-2.0% [-2.7%, -1.2%] 2
All ❌✅ (primary) -0.7% [-0.7%, -0.7%] 4

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.5% [-1.5%, -1.5%] 1
All ❌✅ (primary) - - 0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 14
Improvements ✅
(secondary)
-0.9% [-1.4%, -0.0%] 36
All ❌✅ (primary) -0.1% [-0.2%, -0.0%] 14

Bootstrap: 675.526s -> 675.28s (-0.04%)
Artifact size: 318.49 MiB -> 318.45 MiB (-0.01%)

@rustbot rustbot removed the perf-regression Performance regression. label Apr 10, 2024
@scottmcm
Copy link
Member Author

Wow, I think that might be the happiest perf has ever been with one of my PRs. Even instruction wins on the runtime benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants