Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: Avoid needlessly aligning the stack pointer on some architectures. #20737

Closed
wants to merge 1 commit into from

Conversation

alexrp
Copy link
Member

@alexrp alexrp commented Jul 22, 2024

For these, the relevant ABIs already guarantee that it is aligned appropriately when the kernel transfers control to _start(). Rather than potentially hiding ABI bugs, let's actually assume that the OS works properly unless/until we have evidence to the contrary.

  • This effectively reverts 81232f7.
  • Arm32, LoongArch, and RISC-V continue to do stack alignment because their ABIs specify absolutely nothing about stack alignment upon entry to _start(), and I can find no evidence that the Linux kernel performs explicit stack alignment for them.
  • Arm64 is a bit of a question mark. The ABI likewise says nothing, but the Linux kernel does align the stack to a 16-byte boundary. Leaving it as it was (no alignment) for now.

Marking as draft initially because I wouldn't be surprised if there's breakage.

…res.

For these, the relevant ABIs already guarantee that it is aligned appropriately
when the kernel transfers control to _start(). Rather than potentially hiding
ABI bugs, let's actually assume that the OS works properly unless/until we have
evidence to the contrary.

This effectively reverts 81232f7.
@alexrp alexrp marked this pull request as ready for review July 22, 2024 19:01
@alexrp
Copy link
Member Author

alexrp commented Jul 22, 2024

All Linux CI is green, so ready for review.

@@ -343,7 +325,6 @@ fn _start() callconv(.Naked) noreturn {
\\ addis 2, 12, .TOC. - _start@ha
\\ addi 2, 2, .TOC. - _start@l
\\ mr 3, 1
\\ clrrdi 1, 1, 4
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The lr is already zeroed on entry, as specified by the ABI.
\\ addiu $fp, $zero, 0
\\ move $a0, $sp
\\ .set push
\\ .set noat
\\ addiu $1, $zero, -16
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\\ callq %[posixCallMainAndExit:P]
,
.x86 =>
\\ xorl %%ebp, %%ebp
\\ movl %%esp, %%eax
\\ andl $-16, %%esp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -276,13 +276,11 @@ fn _start() callconv(.Naked) noreturn {
.x86_64 =>
\\ xorl %%ebp, %%ebp
\\ movq %%rsp, %%rdi
\\ andq $-16, %%rsp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobly0
Copy link
Member

jacobly0 commented Jul 22, 2024

Is it really worth saving an instruction or two that probably fits into the function alignment and is only executed once, given the amount of work that would be required to verify that it is valid on all os/abi versions that we intend to support? For example the x86 document had a different stack alignment pre-1.0 which suggests that there did exist os versions that didn't follow that convention yet. I'm also fairly confident that there exist(ed) posix compatibility layers for windows that did not have this stack alignment.

On the other hand, if this is really intended to expose such abi bugs, I would expect some sort of safety to be added to safe builds (related to #20654, and I thought there was an issue about verifying incoming arguments to extern functions such as pointer alignment, but I can't find it). This might require actually realigning the stack first to be able to report the issue later after initialization, but I believe it would increase the mergeability of this PR, since there's no guarantee that CI even verifies that these changes are non-breaking since it mostly tests foreign baseline targets which may not even have the instructions necessary to crash if this abi were broken.

@alexrp
Copy link
Member Author

alexrp commented Jul 22, 2024

Is it really worth saving an instruction or two

On the other hand, if this is really intended to expose such abi bugs

I'm not going to lose sleep over a few extra instructions in the startup path. The latter is the only thing I care about; I would like the code to reflect the true ABI requirements unless we have good reason to go above and beyond (e.g. because of a buggy OS), and in that case it should be properly justified with a comment.

If the code contains redundant operations like stack realignment without a comment justifying why, it also becomes harder to maintain. It forces the reader to question all their knowledge of the ABI and/or their sanity (and in these particular cases, without adequate justification).

given the amount of work that would be required to verify that it is valid on all os/abi versions that we intend to support?

I think the default assumption should be that people follow the standards/specs that they claim to support, at least until we have actual evidence to the contrary, in which case it can be handled on a case-by-case basis. This feels like an odd place to take the opposite view. We codify stuff like this all the time in Zig, e.g. in std.Target, in our many EWHATEVER => unreachable prongs, etc...

That said:

I would expect some sort of safety to be added to safe builds

This is a great idea, and would also make it much easier to detect buggy OSs so we can add the appropriate workaround.

We could pass the original, kernel-supplied stack pointer value as a second argument to posixCallMainAndExit() and, in safety-checked builds, check std.mem.isAligned(sp, builtin.target.stackAlignment()) after basic initialization is done, or something along those lines. I'll see what I can do here.

This might require actually realigning the stack first to be able to report the issue later after initialization

On x86 at least, stack misalignment results in a crash very early on anyway. So if we add the aforementioned check but don't realign the stack, it'll be noticed one way or another if the stack is misaligned. The explicit check after initialization would effectively just be a last resort.

since there's no guarantee that CI even verifies that these changes are non-breaking since it mostly tests foreign baseline targets which may not even have the instructions necessary to crash if this abi were broken

For x86 and x86-64, it's exercised for sure. For the others, we can at least have some confidence by way of the Linux kernel code:

For example the x86 document had a different stack alignment pre-1.0

Right: https://www.sco.com/developers/devspecs/abi386-4.pdf

The requirement used to be 4-byte alignment for both the calling convention and process initialization, but both were raised later. I don't know when exactly this change happened, though I suspect it was around the time x86-64 was first introduced (since SSE was becoming the norm), so it would have been many, many moons ago.

(It's hard to imagine any OS that wouldn't at least give user space a 4-byte aligned stack pointer.)

I think the important insight here is that the stack alignment is indeed part of the ABI. That is, in #20690 terms, it would be part of the ABI component of the target quadruple, and have an implied default based on the OS component. So unless you're doing something silly like overriding the stack alignment while compiling for an OS with 4-byte alignment, things should just work. And if we really wanted to support such an esoteric override, then we could always add the necessary realignment code to start.zig at the point in time where we add that support.

@alexrp
Copy link
Member Author

alexrp commented Jul 22, 2024

This is a great idea, and would also make it much easier to detect buggy OSs so we can add the appropriate workaround.

We could pass the original, kernel-supplied stack pointer value as a second argument to posixCallMainAndExit() and, in safety-checked builds, check std.mem.isAligned(sp, builtin.target.stackAlignment()) after basic initialization is done, or something along those lines. I'll see what I can do here.

I do have some other changes to make in start.zig as I'm doing port work, though. Are we conditioning merging of this PR on adding that safety check? If so, I can mark it as a draft until the other stuff is merged.

@henke96
Copy link

henke96 commented Jul 23, 2024

It's also possible to enter start code from the dynamic linker, not just from the kernel.
Here's an example on Alpine, where i'm getting a stack pointer from musl that's only 8-byte aligned:

/ # cat test.s
.global _start
_start:
mov %rsp, %rdi
and $15, %rdi
call exit
/ # gcc -nostartfiles test.s
/ # ./a.out
/ # echo $?
0
/ # /lib/ld-musl-x86_64.so.1 ./a.out
/ # echo $?
8

Could probably hit this in practice if attempting something like https://github.com/andrewrk/zig-window/

@alexrp
Copy link
Member Author

alexrp commented Jul 23, 2024

It's also possible to enter start code from the dynamic linker, not just from the kernel.
Here's an example on Alpine, where i'm getting a stack pointer from musl that's only 8-byte aligned:

Ok, I can't actually explain this. musl passes the original stack pointer, as given by the kernel, all the way through and loads it just before transferring control to the relocated program's _start. If the stack pointer given by the kernel is aligned (seems to be the case...?), why would it be misaligned by the time we get to _start in this example? And more importantly, wouldn't a program with musl startup code linked in fail to find stuff like argc, argv, envp, etc on the stack as a result?!

Also, when invoked with the glibc dynamic linker, rsp is aligned as expected.

I can reproduce it; I just can't explain it. I wonder if there's a bug somewhere here. Will investigate a bit more.

@alexrp alexrp marked this pull request as draft July 23, 2024 22:02
@alexrp
Copy link
Member Author

alexrp commented Jul 24, 2024

@henke96 I brought this up on the musl mailing list. The main takeaway is here: https://www.openwall.com/lists/musl/2024/07/23/10 (see also the later messages in thread for more explanation on why the misalignment happens)

Basically, the issue you're seeing happens because you're expecting an ABI that you weren't promised. 🙂 The "process initialization" rules only formally apply to the case where the kernel transfers control to your program's entry point. In your case, it's kernel -> musl ldso -> your entry point, while musl is expecting kernel -> musl ldso -> musl crt1-compatible entry point. So, you're subject to whatever ABI musl has defined between its ldso and crt1. The same is true of glibc; it just so happens that it's also compatible with the System V ABI.

Unfortunately for my idea here, start.zig is also used in the case where we create a dynamically-linked executable, so we're subject to the whims of the dynamic linker. Thanks for helping clarify all this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants