-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start
: Avoid needlessly aligning the stack pointer on some architectures.
#20737
Conversation
…res. For these, the relevant ABIs already guarantee that it is aligned appropriately when the kernel transfers control to _start(). Rather than potentially hiding ABI bugs, let's actually assume that the OS works properly unless/until we have evidence to the contrary. This effectively reverts 81232f7.
All Linux CI is green, so ready for review. |
@@ -343,7 +325,6 @@ fn _start() callconv(.Naked) noreturn { | |||
\\ addis 2, 12, .TOC. - _start@ha | |||
\\ addi 2, 2, .TOC. - _start@l | |||
\\ mr 3, 1 | |||
\\ clrrdi 1, 1, 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- https://github.com/rui314/psabi/blob/main/ppc32.pdf
- This one just doesn't touch on process initialization much at all. It seems to be much more focused on the embedded side of things. AFAIK, for PPC32, people usually refer to the next document.
- http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf §3-28
- https://github.com/rui314/psabi/blob/main/ppc64v1.pdf §3.4.1
- https://github.com/rui314/psabi/blob/main/ppc64v2.pdf §4.1.2.1
// The lr is already zeroed on entry, as specified by the ABI. | ||
\\ addiu $fp, $zero, 0 | ||
\\ move $a0, $sp | ||
\\ .set push | ||
\\ .set noat | ||
\\ addiu $1, $zero, -16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No authoritative document seems to exist for 64-bit MIPS, but: https://github.com/torvalds/linux/blob/933069701c1b507825b514317d4edd5d3fd9d417/arch/mips/kernel/process.c#L708-L718
\\ callq %[posixCallMainAndExit:P] | ||
, | ||
.x86 => | ||
\\ xorl %%ebp, %%ebp | ||
\\ movl %%esp, %%eax | ||
\\ andl $-16, %%esp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -276,13 +276,11 @@ fn _start() callconv(.Naked) noreturn { | |||
.x86_64 => | |||
\\ xorl %%ebp, %%ebp | |||
\\ movq %%rsp, %%rdi | |||
\\ andq $-16, %%rsp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth saving an instruction or two that probably fits into the function alignment and is only executed once, given the amount of work that would be required to verify that it is valid on all os/abi versions that we intend to support? For example the x86 document had a different stack alignment pre-1.0 which suggests that there did exist os versions that didn't follow that convention yet. I'm also fairly confident that there exist(ed) posix compatibility layers for windows that did not have this stack alignment. On the other hand, if this is really intended to expose such abi bugs, I would expect some sort of safety to be added to safe builds (related to #20654, and I thought there was an issue about verifying incoming arguments to extern functions such as pointer alignment, but I can't find it). This might require actually realigning the stack first to be able to report the issue later after initialization, but I believe it would increase the mergeability of this PR, since there's no guarantee that CI even verifies that these changes are non-breaking since it mostly tests foreign baseline targets which may not even have the instructions necessary to crash if this abi were broken. |
I'm not going to lose sleep over a few extra instructions in the startup path. The latter is the only thing I care about; I would like the code to reflect the true ABI requirements unless we have good reason to go above and beyond (e.g. because of a buggy OS), and in that case it should be properly justified with a comment. If the code contains redundant operations like stack realignment without a comment justifying why, it also becomes harder to maintain. It forces the reader to question all their knowledge of the ABI and/or their sanity (and in these particular cases, without adequate justification).
I think the default assumption should be that people follow the standards/specs that they claim to support, at least until we have actual evidence to the contrary, in which case it can be handled on a case-by-case basis. This feels like an odd place to take the opposite view. We codify stuff like this all the time in Zig, e.g. in That said:
This is a great idea, and would also make it much easier to detect buggy OSs so we can add the appropriate workaround. We could pass the original, kernel-supplied stack pointer value as a second argument to
On x86 at least, stack misalignment results in a crash very early on anyway. So if we add the aforementioned check but don't realign the stack, it'll be noticed one way or another if the stack is misaligned. The explicit check after initialization would effectively just be a last resort.
For x86 and x86-64, it's exercised for sure. For the others, we can at least have some confidence by way of the Linux kernel code:
Right: https://www.sco.com/developers/devspecs/abi386-4.pdf The requirement used to be 4-byte alignment for both the calling convention and process initialization, but both were raised later. I don't know when exactly this change happened, though I suspect it was around the time x86-64 was first introduced (since SSE was becoming the norm), so it would have been many, many moons ago. (It's hard to imagine any OS that wouldn't at least give user space a 4-byte aligned stack pointer.) I think the important insight here is that the stack alignment is indeed part of the ABI. That is, in #20690 terms, it would be part of the ABI component of the target quadruple, and have an implied default based on the OS component. So unless you're doing something silly like overriding the stack alignment while compiling for an OS with 4-byte alignment, things should just work. And if we really wanted to support such an esoteric override, then we could always add the necessary realignment code to |
I do have some other changes to make in |
It's also possible to enter start code from the dynamic linker, not just from the kernel.
Could probably hit this in practice if attempting something like https://github.com/andrewrk/zig-window/ |
Ok, I can't actually explain this. musl passes the original stack pointer, as given by the kernel, all the way through and loads it just before transferring control to the relocated program's Also, when invoked with the glibc dynamic linker, I can reproduce it; I just can't explain it. I wonder if there's a bug somewhere here. Will investigate a bit more. |
@henke96 I brought this up on the musl mailing list. The main takeaway is here: https://www.openwall.com/lists/musl/2024/07/23/10 (see also the later messages in thread for more explanation on why the misalignment happens) Basically, the issue you're seeing happens because you're expecting an ABI that you weren't promised. 🙂 The "process initialization" rules only formally apply to the case where the kernel transfers control to your program's entry point. In your case, it's kernel -> musl ldso -> your entry point, while musl is expecting kernel -> musl ldso -> musl crt1-compatible entry point. So, you're subject to whatever ABI musl has defined between its ldso and crt1. The same is true of glibc; it just so happens that it's also compatible with the System V ABI. Unfortunately for my idea here, |
For these, the relevant ABIs already guarantee that it is aligned appropriately when the kernel transfers control to
_start()
. Rather than potentially hiding ABI bugs, let's actually assume that the OS works properly unless/until we have evidence to the contrary._start()
, and I can find no evidence that the Linux kernel performs explicit stack alignment for them.Marking as draft initially because I wouldn't be surprised if there's breakage.