Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gvn: Promote/propagate const local array #126444

Closed
wants to merge 3 commits into from

Conversation

tesuji
Copy link
Contributor

@tesuji tesuji commented Jun 13, 2024

Rewriting of #125916 which used PromoteTemps pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs llvm.memcpy (which is easy for LLVM to optimize) instead of a series
of store on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in #73825 or zulip for more info.

Here is a simple micro benchmark that shows the performance differences between promoting arrays or not.

Prior discussions on zulip.

This patch saves about -50.36 KiB (-0.038%) of librustc_driver.so.
image

Fix #73825

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 13, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jun 13, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@tesuji tesuji changed the title WIP: Trying to promote const local array using GVN pass WIP: Promote/propagate const local array using GVN pass Jun 13, 2024
@tesuji tesuji force-pushed the gvn-const-arrays branch from e8f832f to 5944765 Compare June 13, 2024 22:32
@tesuji tesuji changed the title WIP: Promote/propagate const local array using GVN pass [WIP] gvn: Promote/propagate const local array Jun 13, 2024
@rust-log-analyzer

This comment has been minimized.

@tesuji tesuji force-pushed the gvn-const-arrays branch from 5944765 to 550fb81 Compare June 13, 2024 22:43
@jieyouxu
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 13, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 13, 2024
[WIP] gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used PromoteTemps pass.

Fix rust-lang#73825

### Current status

- [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F).

r? ghost
@bors
Copy link
Contributor

bors commented Jun 13, 2024

⌛ Trying commit 550fb81 with merge e26c0b3...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 14, 2024

☀️ Try build successful - checks-actions
Build commit: e26c0b3 (e26c0b3f8c9af007281a11df56a0bf825d8b4cb0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e26c0b3): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 3.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
9.2% [4.2%, 18.7%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-9.1% [-9.9%, -8.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.1% [-9.9%, 18.7%] 6

Cycles

Results (primary -4.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.0% [-5.3%, -1.7%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.0% [-5.3%, -1.7%] 5

Binary size

Results (primary -0.1%, secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 1
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.2%, 0.0%] 6

Bootstrap: 671.518s -> 674.73s (0.48%)
Artifact size: 320.38 MiB -> 320.39 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 14, 2024
@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 14, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 14, 2024
@bors
Copy link
Contributor

bors commented Jun 14, 2024

⌛ Trying commit 6c6de58 with merge 7e160d4...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 14, 2024
[WIP] gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used PromoteTemps pass.

Fix rust-lang#73825

### Current status

- [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F).

r? ghost
@bors
Copy link
Contributor

bors commented Jun 14, 2024

☀️ Try build successful - checks-actions
Build commit: 7e160d4 (7e160d4b55bb5a27be0696f45db247ccc2e166d9)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7e160d4): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
11.0% [3.9%, 21.3%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-7.6% [-10.4%, -4.8%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.7% [-10.4%, 21.3%] 6

Cycles

Results (secondary 9.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
9.0% [9.0%, 9.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.2%, 0.0%] 6

Bootstrap: 670.938s -> 673.147s (0.33%)
Artifact size: 320.39 MiB -> 319.79 MiB (-0.19%)

@BoxyUwU
Copy link
Member

BoxyUwU commented Jul 14, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 14, 2024
@bors
Copy link
Contributor

bors commented Jul 14, 2024

⌛ Trying commit 8672700 with merge ac2e9cd...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2024
gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used `PromoteTemps` pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series
of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in rust-lang#73825 or [zulip][opsem] for more info.

[Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not.

[Prior discussions on zulip][opsem].

This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`.
![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1)

Fix rust-lang#73825

r? cjgillot

### Unresolved questions
- [ ] Should we ignore nested arrays?
    I think that promoting nested arrays is bloating codegen.
- [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed.
    If yes, the test should be updated to make arrays larger than 32 bytes.
- [x] ~Is this concerning that  `call(move _1)` is now `call(const [array])`?~
  It reverted back to `call(move _1)`

[opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[bench]: rust-lang/rust-clippy#12854 (comment)
[perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
@bors
Copy link
Contributor

bors commented Jul 14, 2024

☀️ Try build successful - checks-actions
Build commit: ac2e9cd (ac2e9cd42525cb1be45517156e5c5dbd10dc5a0e)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ac2e9cd): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) -0.4% [-0.4%, -0.4%] 1

Max RSS (memory usage)

Results (primary -1.2%, secondary -0.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [2.8%, 2.8%] 1
Regressions ❌
(secondary)
5.0% [4.4%, 5.6%] 2
Improvements ✅
(primary)
-5.2% [-5.2%, -5.2%] 1
Improvements ✅
(secondary)
-4.5% [-5.3%, -3.2%] 3
All ❌✅ (primary) -1.2% [-5.2%, 2.8%] 2

Cycles

Results (secondary -2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.1%, secondary -0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.1%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.4%, -0.0%] 3
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 12
All ❌✅ (primary) -0.1% [-0.4%, 0.1%] 5

Bootstrap: 705.571s -> 705.908s (0.05%)
Artifact size: 328.69 MiB -> 328.62 MiB (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 14, 2024
@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor

Kobzol commented Jul 14, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 14, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2024
gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used `PromoteTemps` pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series
of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in rust-lang#73825 or [zulip][opsem] for more info.

[Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not.

[Prior discussions on zulip][opsem].

This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`.
![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1)

Fix rust-lang#73825

r? cjgillot

### Unresolved questions
- [ ] Should we ignore nested arrays?
    I think that promoting nested arrays is bloating codegen.
- [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed.
    If yes, the test should be updated to make arrays larger than 32 bytes.
- [x] ~Is this concerning that  `call(move _1)` is now `call(const [array])`?~
  It reverted back to `call(move _1)`

[opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[bench]: rust-lang/rust-clippy#12854 (comment)
[perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
@bors
Copy link
Contributor

bors commented Jul 14, 2024

⌛ Trying commit c15eb60 with merge b6d6d25...

@bors
Copy link
Contributor

bors commented Jul 14, 2024

☀️ Try build successful - checks-actions
Build commit: b6d6d25 (b6d6d25a7e03cda5b6e133fd6541106859d1489d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b6d6d25): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.5% [-0.5%, -0.5%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -1.5%, secondary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.7% [3.8%, 5.9%] 3
Improvements ✅
(primary)
-1.5% [-1.7%, -1.3%] 2
Improvements ✅
(secondary)
-5.6% [-5.6%, -5.6%] 1
All ❌✅ (primary) -1.5% [-1.7%, -1.3%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (secondary -0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 5
All ❌✅ (primary) - - 0

Bootstrap: 699.561s -> 699.867s (0.04%)
Artifact size: 328.65 MiB -> 328.59 MiB (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 14, 2024
@tesuji tesuji force-pushed the gvn-const-arrays branch from c15eb60 to 7b96b9c Compare July 15, 2024 05:56
@tesuji
Copy link
Contributor Author

tesuji commented Jul 15, 2024

From the last perf. run, it seems that there are no performance advantages to avoid LLVM de-duplicating arrays.
I reverted that changes and squashed all commits for the final review.

@@ -418,9 +421,7 @@ impl<'body, 'tcx> VnState<'body, 'tcx> {
self.ecx.copy_op(op, &field_dest).ok()?;
}
self.ecx.write_discriminant(variant.unwrap_or(FIRST_VARIANT), &dest).ok()?;
self.ecx
.alloc_mark_immutable(dest.ptr().provenance.unwrap().alloc_id())
.ok()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we stop marking as immutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but I think let dest = dest.map_provenance(|prov| prov.as_immutable()); in the line below could serve the same purpose.

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 20, 2024
@tesuji
Copy link
Contributor Author

tesuji commented Aug 22, 2024

Well! Nothing to do here!

@tesuji tesuji closed this Aug 22, 2024
@tesuji tesuji deleted the gvn-const-arrays branch August 22, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New optimization: Move non-mutable array of Copy type to .rodata