-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval
on x86 in the process.
#103830
Conversation
By the way, https://twitter.com/pcwalton/status/1587282342174871552 shows the effect of this change on the |
This comment has been minimized.
This comment has been minimized.
I think this will have the unintended effect of fixing #80127. It is the exact same change that was proposed by #80822, which never went in (I think because someone found that a couple of codegen tests failed on windows). It is also the same change I made to our fork of rustc at work so that we could work around the problem of #80127. So anecdotally, this change is known to fix the problem. I think we might not want to accidentally fix that bug though, because as @eddyb states here (paraphrasing): we should not be depending on this field in codegen for getting the correct ABI alignment, because it's intended only for optimizations. Something like that. I just worry about having #80127 closed as fixed because of this change and then one day down the road somebody does something different with this field for optimization purposes, and having x86_64 ABI break again on stable. Of course, @pcwalton, if you feel up to it, you could take a stab at actually fixing that bug in the "right way." @bjorn3 left some pretty good breadcrumbs in #80127. I also spent some time a while back trying to follow the breadcrumbs all the way home, and IIRC I got somewhat close to a working solution, but didn't have enough cycles to see it through. Would be happy to chat (zulip, zoom, whatever) to share what I can remember. |
Hmm, I see, the bug is still there. I think to be safe I'm going to move the check to be Windows only then. |
5a5c909
to
9a3d50a
Compare
byval
on x86 in the process.
This comment has been minimized.
This comment has been minimized.
@@ -658,7 +662,8 @@ impl<'a, Ty> FnAbi<'a, Ty> { | |||
{ | |||
if abi == spec::abi::Abi::X86Interrupt { | |||
if let Some(arg) = self.args.first_mut() { | |||
arg.make_indirect_byval(); | |||
// FIXME(pcwalton): This probably should use the x86 `byval` ABI... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For extern "x86-interrupt" fn(_: InterruptStackFrame)
the argument is passed as pointer without byval attribute AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying I should change make_indirect_byval()
to make_indirect()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM requires this argument to be marked byval
. You'll get a verifier error otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a regression test for the bug in #80127?
Done. |
|
||
extern "C" { | ||
// CHECK: declare void @f({{.*}}byval(%Foo) align 16{{.*}}) | ||
fn f(foo: Foo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to exceed a certain number of parameters for the original bug to surface. Enough to fill up the registers. I think the function signature in #80127 was the minimal repro; I remember trying to minimize it further and having not much luck.
I'm trying to use godbolt to figure it out but I'm having frustrating usability problems, and my personal device isn't x86_64 -_-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think testing for the align attribute directly is better, because the test case in #80127 might randomly start to succeed if future changes to regalloc cause things to accidentally align, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a regression test it should never start to succeed, it could only ever start to fail right? I guess I'm confused by what you're saying.
Also, my memory is quite fuzzy, but I thought that part of the underlying cause of the bug was that our logic for choosing how to pass args on x86_64 was wrong for stack arguments (which would require your test to have enough arguments to get to that point I think). But judging by you saying "I think testing for the align attribute directly is better", is it just that we were never putting align
on any of the byval args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, I think I fundamentally don't trust a test that just checks the LLVM IR, because I don't understand it well enough for that to be convincing over a real integration test that proves that the C code can operate with the Rust code. But I recognize that that's probably quite hard to test in the rust repo. Perhaps someone else knows a way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a regression test it should never start to succeed, it could only ever start to fail right? I guess I'm confused by what you're saying.
I'm saying it could get broken accidentally but we wouldn't notice it because it could accidentally start to pass. I can add an interop test though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be awesome. Thank you for tackling this bug that has existed for so so long 🙏🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the test.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 07547fc706294bdaafc9c7f3359e8feb1cb0fae0 with merge ccd8e3c14de53ce413a7d2649c12d558d226226d... |
☀️ Try build successful - checks-actions |
Queued ccd8e3c14de53ce413a7d2649c12d558d226226d with parent edf0182, future comparison URL. |
Finished benchmarking commit (ccd8e3c14de53ce413a7d2649c12d558d226226d): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Sweet! I was worried it would be a regression. |
This should be ready to go, I think. r? @oli-obk |
rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of `byval` on x86 in the process. Commit 88e4d2c from five years ago removed support for alignment on indirectly-passed arguments because of problems with the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I recently added to LLVM 16 depend on this to forward `memcpy`s. This commit attempts to fix the problems with `byval` parameters on that target and now correctly adds the `align` attribute. The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has special alignment rules for `byval` parameters: for the most part, their alignment is forced to 4. This is not well-documented anywhere but in the Clang source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate it here. The relevant methods in that file are `X86_32ABIInfo::getIndirectResult()` and `X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute for `byval` parameters in LLVM must match the platform ABI, or miscompilations will occur. Note that this doesn't use the approach suggested by eddyb, because I felt it was overkill to store the alignment in `on_stack` when special handling is really only needed for 32-bit x86. As a side effect, this should fix rust-lang#80127, because it will make the `align` parameter attribute for `byval` parameters match the platform ABI on LLVM x86-64. [this comment]: rust-lang#80822 (comment)
🔒 Merge conflict This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again. How do I rebase?Assuming
You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial. Please avoid the "Resolve conflicts" button on GitHub. It uses Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Error message
|
// ignore-spir | ||
// ignore-spir64 | ||
// ignore-kalimba | ||
// ignore-shave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really fragile. I'd suggest to implement this in the same way as https://github.com/rust-lang/rust/blob/master/tests/codegen/abi-repr-ext.rs, i.e. a no_core test explicitly specifying targets. That should also allow you to test a non-windows target for the issue below.
@bors r- Got back into the queue due to bors sync probably |
Improve the `array::map` codegen The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](https://github.com/rust-lang/rust/blob/7c46fb2111936ad21a8e3aa41e9128752357f5d8/library/core/src/array/mod.rs#L865-L912) function, I had some ideas that ended up working better than I'd expected. There's three main ideas in here, split over three commits: 1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily 2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant) 3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward. (No libs-api changes; everything should be completely implementation-detail-internal.) It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (rust-lang#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞 r? `@thomcc` As a simple example, this test ```rust pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] { x.map(|x| 13 * x + 7) } ``` On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808` ```llvm start: %array.i.i.i.i = alloca [64 x i32], align 4 %_3.sroa.5.i.i.i = alloca [65 x i32], align 4 %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8 ``` (and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output) But with this PR it's only `sub rsp, 520` ```llvm start: %array.i.i.i.i.i.i = alloca [64 x i32], align 4 %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4 ``` Similarly, the loop it emits on nightly is scalar-only and horrifying ```nasm .LBB0_1: mov esi, 64 mov edi, 0 cmp rdx, 64 je .LBB0_3 lea rsi, [rdx + 1] mov qword ptr [rsp + 784], rsi mov r8d, dword ptr [rsp + 4*rdx + 528] mov edi, 1 lea edx, [r8 + 2*r8] lea r8d, [r8 + 4*rdx] add r8d, 7 .LBB0_3: test edi, edi je .LBB0_11 mov dword ptr [rsp + 4*rcx + 272], r8d cmp rsi, 64 jne .LBB0_6 xor r8d, r8d mov edx, 64 test r8d, r8d jne .LBB0_8 jmp .LBB0_11 .LBB0_6: lea rdx, [rsi + 1] mov qword ptr [rsp + 784], rdx mov edi, dword ptr [rsp + 4*rsi + 528] mov r8d, 1 lea esi, [rdi + 2*rdi] lea edi, [rdi + 4*rsi] add edi, 7 test r8d, r8d je .LBB0_11 .LBB0_8: mov dword ptr [rsp + 4*rcx + 276], edi add rcx, 2 cmp rcx, 64 jne .LBB0_1 ``` whereas with this PR it's unrolled and vectorized ```nasm vpmulld ymm1, ymm0, ymmword ptr [rsp + 64] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 328], ymm1 vpmulld ymm1, ymm0, ymmword ptr [rsp + 96] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 360], ymm1 ``` (though sadly still stack-to-stack)
Visiting as part of looking at #80127 for 2023 Q1 P-high triage. @pcwalton , do you think you'll have time soon to incorporate the feedback from @wesleywiser and @nikic ? If not, maybe one of us can take care of those modifications... |
@pcwalton any updates on this? thanks |
Superseded by #111551. |
Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. Same as rust-lang#111551, which I [accidentally closed](rust-lang#111551 (comment)) :/ --- This resurrects PR rust-lang#103830, which has sat idle for a while. Beyond rust-lang#103830, this also: - fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`) - fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`) - fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`) r? `@nikic` --- `@pcwalton's` original PR description is reproduced below: Commit 88e4d2c from five years ago removed support for alignment on indirectly-passed arguments because of problems with the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I recently added to LLVM 16 depend on this to forward `memcpy`s. This commit attempts to fix the problems with `byval` parameters on that target and now correctly adds the `align` attribute. The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has special alignment rules for `byval` parameters: for the most part, their alignment is forced to 4. This is not well-documented anywhere but in the Clang source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate it here. The relevant methods in that file are `X86_32ABIInfo::getIndirectResult()` and `X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute for `byval` parameters in LLVM must match the platform ABI, or miscompilations will occur. Note that this doesn't use the approach suggested by eddyb, because I felt it was overkill to store the alignment in `on_stack` when special handling is really only needed for 32-bit x86. As a side effect, this should fix rust-lang#80127, because it will make the `align` parameter attribute for `byval` parameters match the platform ABI on LLVM x86-64. [this comment]: rust-lang#80822 (comment)
Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. Same as rust-lang#111551, which I [accidentally closed](rust-lang#111551 (comment)) :/ --- This resurrects PR rust-lang#103830, which has sat idle for a while. Beyond rust-lang#103830, this also: - fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`) - fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`) - fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`) r? `@nikic` --- `@pcwalton's` original PR description is reproduced below: Commit 88e4d2c from five years ago removed support for alignment on indirectly-passed arguments because of problems with the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I recently added to LLVM 16 depend on this to forward `memcpy`s. This commit attempts to fix the problems with `byval` parameters on that target and now correctly adds the `align` attribute. The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has special alignment rules for `byval` parameters: for the most part, their alignment is forced to 4. This is not well-documented anywhere but in the Clang source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate it here. The relevant methods in that file are `X86_32ABIInfo::getIndirectResult()` and `X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute for `byval` parameters in LLVM must match the platform ABI, or miscompilations will occur. Note that this doesn't use the approach suggested by eddyb, because I felt it was overkill to store the alignment in `on_stack` when special handling is really only needed for 32-bit x86. As a side effect, this should fix rust-lang#80127, because it will make the `align` parameter attribute for `byval` parameters match the platform ABI on LLVM x86-64. [this comment]: rust-lang#80822 (comment)
Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the
i686-pc-windows-msvc
target. Unfortunately, thememcpy
optimizations Irecently added to LLVM 16 depend on this to forward
memcpy
s. This commitattempts to fix the problems with
byval
parameters on that target and nowcorrectly adds the
align
attribute.The problem is summarized in this comment by @eddyb. Briefly, 32-bit x86 has
special alignment rules for
byval
parameters: for the most part, theiralignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang
TargetInfo.cpp
and tried to replicateit here. The relevant methods in that file are
X86_32ABIInfo::getIndirectResult()
andX86_32ABIInfo::getTypeStackAlignInBytes()
. Thealign
parameter attributefor
byval
parameters in LLVM must match the platform ABI, or miscompilationswill occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in
on_stack
when specialhandling is really only needed for 32-bit x86.
As a side effect, this should fix #80127, because it will make the
align
parameter attribute for
byval
parameters match the platform ABI on LLVMx86-64.