Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using LLVM struct types for byval/sret #122050

Merged
merged 6 commits into from
Mar 11, 2024
Merged

Conversation

erikdesjardins
Copy link
Contributor

@erikdesjardins erikdesjardins commented Mar 6, 2024

For byval and sret, the type has no semantic meaning, only the size matters*†. Using [N x i8] is a more direct way to specify that we want N bytes, and avoids relying on LLVM's struct layout.

*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for sret; for byval, we didn't until #112157.

†: For byval, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from #121577.

r? @nikic

This avoids depending on LLVM's struct types to determine the size of
the byval/sret slot.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 6, 2024
Comment on lines 15 to 19
// CHECK: %int16x4x2_t = type { <4 x i16>, <4 x i16> }
#[no_mangle]
fn takes_int16x4x2_t(t: int16x4x2_t) -> int16x4x2_t {
extern "unadjusted" fn takes_int16x4x2_t(t: int16x4x2_t) -> int16x4x2_t {
t
}
Copy link
Contributor Author

@erikdesjardins erikdesjardins Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only indirectly used the struct type via byval (so it would be removed by these changes), but the original motivation (#87254) was for the unadjusted ABI, where we use the struct type directly and pass the vectors by value. Changed it to test that.

@the8472
Copy link
Member

the8472 commented Mar 6, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Stop using LLVM struct types for byval/sret

For `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.

\*: The alignment would also matter if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from rust-lang#121577.

r? `@nikic`
@bors
Copy link
Contributor

bors commented Mar 6, 2024

⌛ Trying commit 96a7267 with merge d5b8881...

@bors
Copy link
Contributor

bors commented Mar 6, 2024

☀️ Try build successful - checks-actions
Build commit: d5b8881 (d5b8881b55df9f860fcb933490499356a7ec3a64)

1 similar comment
@bors
Copy link
Contributor

bors commented Mar 6, 2024

☀️ Try build successful - checks-actions
Build commit: d5b8881 (d5b8881b55df9f860fcb933490499356a7ec3a64)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d5b8881): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-2.4%, -0.3%] 3
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 1
All ❌✅ (primary) -1.5% [-2.4%, -0.3%] 3

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.5% [1.5%, 1.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.5% [1.5%, 1.5%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.3% [2.3%, 2.3%] 1
Improvements ✅
(primary)
-3.0% [-3.2%, -2.8%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.0% [-3.2%, -2.8%] 2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.3%, -0.1%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.2% [-0.3%, -0.1%] 6

Bootstrap: 646.18s -> 643.479s (-0.42%)
Artifact size: 175.03 MiB -> 175.05 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
@nikic
Copy link
Contributor

nikic commented Mar 6, 2024

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Mar 6, 2024

📌 Commit 96a7267 has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 6, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like the docs at https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/abi/enum.PassMode.html#variant.Indirect should be updated then? Currently they say

This corresponds to the byval LLVM argument attribute (using the Rust type of this argument).

The parenthetical is no longer true with this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the parenthetical. The important part is that byval is used, the specific type is not so important (it could be [N x i8], iM, the Rust struct type, etc.)

@RalfJung
Copy link
Member

RalfJung commented Mar 7, 2024 via email

@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Mar 7, 2024

Added more info.

Your point makes sense, especially as the alignment does not actually match the Rust type, nor is it guaranteed to be higher or lower than the Rust type's alignment. Instead, it's...complicated.

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2024

Oh wow.... and you're sure that's sound? Nothing is then relying on that alignment matching the Rust type alignment?

@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Mar 8, 2024

byval is only used for non-Rust ABIs, so doing this is necessary for soundness. If we use the Rust type's alignment, we'll read from an incorrect stack offset. (This is what caused #80127.)

Using a different alignment is fine in terms of Rust semantics because the byval pointer isn't usable from Rust code. (This is the same thing I touch on in this thread.)

If you do take a reference to such an argument, it gets copied to a higher-aligned alloca, which we have the freedom to do since it was passed by value. Actually it doesn't (https://godbolt.org/z/cfM4PEGer), which is unsound. This is just a bug in the backend though--there's some code which skips the alloca if the source and destination have the same representation, and it must not be checking for alignment. (In other words, this isn't as bad as #112480--we can just fix it with no language-visible impact.) I'll open another PR to fix it.

We handle the reverse situation already--if you have a type where the Rust alignment is lower than the byval alignment, we copy it (https://godbolt.org/z/WYfj58o36) to a higher-aligned alloca before calling the byval function.

As for this PR, it doesn't change the status quo. Before this PR it would generate ptr byval(%HighAlign) align 4, and after ptr byval([32 x i8]) align 4, but of course both of those have the same alignment.

Edit: opened #122212

@RalfJung
Copy link
Member

RalfJung commented Mar 9, 2024

byval is only used for non-Rust ABIs, so doing this is necessary for soundness. If we use the Rust type's alignment, we'll read from an incorrect stack offset. (This is what caused #80127.)

I don't understand how that is possible. If the alignment differs between the Rust type and the C type, then things will go wrong in a bunch of places, not just for byval argument passing.

Or is it the case some some ABIs have entirely independent alignments for when an argument is "in memory" vs passed on the stack?

Unfortunately the PR links to "this comment" by @eddyb but the link is broken, so it's hard to read up on what happened. (EDIT: Ah, found it.) Anyway all that information should make it into suitable rustc comments as it'll be easier to find there.

Using a different alignment is fine in terms of Rust semantics because the byval pointer isn't usable from Rust code.

Ah, that's the key point. So I hope the codegen backend remembers this and never adds a Rust-type-based alignment annotation when working on these pointers.

@bors r- (to update the labels, this already left the queue when you pushed)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 9, 2024
@nikic
Copy link
Contributor

nikic commented Mar 10, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Mar 10, 2024

📌 Commit 8fdd5e0 has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 10, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 10, 2024
Stop using LLVM struct types for byval/sret

For `byval` and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.

\*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from rust-lang#121577.

r? `@nikic`
@bors
Copy link
Contributor

bors commented Mar 10, 2024

⌛ Testing commit 8fdd5e0 with merge aba35a5...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 10, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 10, 2024
@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Mar 10, 2024

Ah, and of course if those tests never ran, or only ran on their target-specific builders, they wouldn't have ran on the x86 nopt builders either.

@nikic
Copy link
Contributor

nikic commented Mar 10, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Mar 10, 2024

📌 Commit f18c2f8 has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 10, 2024
@bors
Copy link
Contributor

bors commented Mar 11, 2024

⌛ Testing commit f18c2f8 with merge a6d93ac...

@bors
Copy link
Contributor

bors commented Mar 11, 2024

☀️ Test successful - checks-actions
Approved by: nikic
Pushing a6d93ac to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 11, 2024
@bors bors merged commit a6d93ac into rust-lang:master Mar 11, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 11, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a6d93ac): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.9% [0.5%, 3.3%] 2
Improvements ✅
(primary)
-2.1% [-2.4%, -1.9%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.1% [-2.4%, -1.9%] 2

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [1.4%, 2.5%] 4
Regressions ❌
(secondary)
2.7% [2.3%, 3.3%] 6
Improvements ✅
(primary)
-2.7% [-2.7%, -2.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [-2.7%, 2.5%] 5

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.0% [1.8%, 2.1%] 5
Improvements ✅
(primary)
-2.3% [-2.3%, -2.3%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.3% [-2.3%, -2.3%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.3%, -0.1%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.2% [-0.3%, -0.1%] 6

Bootstrap: 647.708s -> 645.581s (-0.33%)
Artifact size: 309.95 MiB -> 309.97 MiB (0.01%)

@erikdesjardins
Copy link
Contributor Author

Those regressions (and probably improvements) are noise, they were undone in the next merge.

This is so direct that I feel justified to do

@rustbot label perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants