`start`: Avoid needlessly aligning the stack pointer on some architectures. #20737

alexrp · 2024-07-22T13:51:28Z

For these, the relevant ABIs already guarantee that it is aligned appropriately when the kernel transfers control to _start(). Rather than potentially hiding ABI bugs, let's actually assume that the OS works properly unless/until we have evidence to the contrary.

This effectively reverts 81232f7.
Arm32, LoongArch, and RISC-V continue to do stack alignment because their ABIs specify absolutely nothing about stack alignment upon entry to _start(), and I can find no evidence that the Linux kernel performs explicit stack alignment for them.
Arm64 is a bit of a question mark. The ABI likewise says nothing, but the Linux kernel does align the stack to a 16-byte boundary. Leaving it as it was (no alignment) for now.
- My impression here is that the process initialization ABI for Arm on Linux is, in general, informal. Which is weird, considering how much ABI documentation otherwise exists.

Marking as draft initially because I wouldn't be surprised if there's breakage.

…res. For these, the relevant ABIs already guarantee that it is aligned appropriately when the kernel transfers control to _start(). Rather than potentially hiding ABI bugs, let's actually assume that the OS works properly unless/until we have evidence to the contrary. This effectively reverts 81232f7.

alexrp · 2024-07-22T19:01:44Z

All Linux CI is green, so ready for review.

alexrp · 2024-07-22T19:24:12Z

lib/std/start.zig

@@ -343,7 +325,6 @@ fn _start() callconv(.Naked) noreturn {
            \\ addis 2, 12, .TOC. - _start@ha
            \\ addi 2, 2, .TOC. - _start@l
            \\ mr 3, 1
-            \\ clrrdi 1, 1, 4


https://github.com/rui314/psabi/blob/main/ppc32.pdf

This one just doesn't touch on process initialization much at all. It seems to be much more focused on the embedded side of things. AFAIK, for PPC32, people usually refer to the next document.

http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf §3-28

https://github.com/rui314/psabi/blob/main/ppc64v1.pdf §3.4.1

https://github.com/rui314/psabi/blob/main/ppc64v2.pdf §4.1.2.1

alexrp · 2024-07-22T19:24:15Z

lib/std/start.zig

            // The lr is already zeroed on entry, as specified by the ABI.
            \\ addiu $fp, $zero, 0
            \\ move $a0, $sp
-            \\ .set push
-            \\ .set noat
-            \\ addiu $1, $zero, -16


https://github.com/rui314/psabi/blob/main/mips.pdf §3-28

No authoritative document seems to exist for 64-bit MIPS, but: https://github.com/torvalds/linux/blob/933069701c1b507825b514317d4edd5d3fd9d417/arch/mips/kernel/process.c#L708-L718

alexrp · 2024-07-22T19:24:16Z

lib/std/start.zig

            \\ callq %[posixCallMainAndExit:P]
            ,
            .x86 =>
            \\ xorl %%ebp, %%ebp
            \\ movl %%esp, %%eax
-            \\ andl $-16, %%esp


https://github.com/rui314/psabi/blob/main/i386.pdf §2.3.1

alexrp · 2024-07-22T19:24:17Z

lib/std/start.zig

@@ -276,13 +276,11 @@ fn _start() callconv(.Naked) noreturn {
            .x86_64 =>
            \\ xorl %%ebp, %%ebp
            \\ movq %%rsp, %%rdi
-            \\ andq $-16, %%rsp


https://github.com/rui314/psabi/blob/main/x86-64.pdf §3.4.1

jacobly0 · 2024-07-22T20:35:55Z

Is it really worth saving an instruction or two that probably fits into the function alignment and is only executed once, given the amount of work that would be required to verify that it is valid on all os/abi versions that we intend to support? For example the x86 document had a different stack alignment pre-1.0 which suggests that there did exist os versions that didn't follow that convention yet. I'm also fairly confident that there exist(ed) posix compatibility layers for windows that did not have this stack alignment.

On the other hand, if this is really intended to expose such abi bugs, I would expect some sort of safety to be added to safe builds (related to #20654, and I thought there was an issue about verifying incoming arguments to extern functions such as pointer alignment, but I can't find it). This might require actually realigning the stack first to be able to report the issue later after initialization, but I believe it would increase the mergeability of this PR, since there's no guarantee that CI even verifies that these changes are non-breaking since it mostly tests foreign baseline targets which may not even have the instructions necessary to crash if this abi were broken.

alexrp · 2024-07-22T21:49:17Z

Is it really worth saving an instruction or two

On the other hand, if this is really intended to expose such abi bugs

I'm not going to lose sleep over a few extra instructions in the startup path. The latter is the only thing I care about; I would like the code to reflect the true ABI requirements unless we have good reason to go above and beyond (e.g. because of a buggy OS), and in that case it should be properly justified with a comment.

If the code contains redundant operations like stack realignment without a comment justifying why, it also becomes harder to maintain. It forces the reader to question all their knowledge of the ABI and/or their sanity (and in these particular cases, without adequate justification).

given the amount of work that would be required to verify that it is valid on all os/abi versions that we intend to support?

I think the default assumption should be that people follow the standards/specs that they claim to support, at least until we have actual evidence to the contrary, in which case it can be handled on a case-by-case basis. This feels like an odd place to take the opposite view. We codify stuff like this all the time in Zig, e.g. in std.Target, in our many EWHATEVER => unreachable prongs, etc...

That said:

I would expect some sort of safety to be added to safe builds

This is a great idea, and would also make it much easier to detect buggy OSs so we can add the appropriate workaround.

We could pass the original, kernel-supplied stack pointer value as a second argument to posixCallMainAndExit() and, in safety-checked builds, check std.mem.isAligned(sp, builtin.target.stackAlignment()) after basic initialization is done, or something along those lines. I'll see what I can do here.

This might require actually realigning the stack first to be able to report the issue later after initialization

On x86 at least, stack misalignment results in a crash very early on anyway. So if we add the aforementioned check but don't realign the stack, it'll be noticed one way or another if the stack is misaligned. The explicit check after initialization would effectively just be a last resort.

since there's no guarantee that CI even verifies that these changes are non-breaking since it mostly tests foreign baseline targets which may not even have the instructions necessary to crash if this abi were broken

For x86 and x86-64, it's exercised for sure. For the others, we can at least have some confidence by way of the Linux kernel code:

For example the x86 document had a different stack alignment pre-1.0

Right: https://www.sco.com/developers/devspecs/abi386-4.pdf

The requirement used to be 4-byte alignment for both the calling convention and process initialization, but both were raised later. I don't know when exactly this change happened, though I suspect it was around the time x86-64 was first introduced (since SSE was becoming the norm), so it would have been many, many moons ago.

(It's hard to imagine any OS that wouldn't at least give user space a 4-byte aligned stack pointer.)

I think the important insight here is that the stack alignment is indeed part of the ABI. That is, in #20690 terms, it would be part of the ABI component of the target quadruple, and have an implied default based on the OS component. So unless you're doing something silly like overriding the stack alignment while compiling for an OS with 4-byte alignment, things should just work. And if we really wanted to support such an esoteric override, then we could always add the necessary realignment code to start.zig at the point in time where we add that support.

alexrp · 2024-07-22T22:43:01Z

This is a great idea, and would also make it much easier to detect buggy OSs so we can add the appropriate workaround.

We could pass the original, kernel-supplied stack pointer value as a second argument to posixCallMainAndExit() and, in safety-checked builds, check std.mem.isAligned(sp, builtin.target.stackAlignment()) after basic initialization is done, or something along those lines. I'll see what I can do here.

I do have some other changes to make in start.zig as I'm doing port work, though. Are we conditioning merging of this PR on adding that safety check? If so, I can mark it as a draft until the other stuff is merged.

henke96 · 2024-07-23T18:11:20Z

It's also possible to enter start code from the dynamic linker, not just from the kernel.
Here's an example on Alpine, where i'm getting a stack pointer from musl that's only 8-byte aligned:

/ # cat test.s
.global _start
_start:
mov %rsp, %rdi
and $15, %rdi
call exit
/ # gcc -nostartfiles test.s
/ # ./a.out
/ # echo $?
0
/ # /lib/ld-musl-x86_64.so.1 ./a.out
/ # echo $?
8

Could probably hit this in practice if attempting something like https://github.com/andrewrk/zig-window/

alexrp · 2024-07-23T19:06:20Z

It's also possible to enter start code from the dynamic linker, not just from the kernel.
Here's an example on Alpine, where i'm getting a stack pointer from musl that's only 8-byte aligned:

Ok, I can't actually explain this. musl passes the original stack pointer, as given by the kernel, all the way through and loads it just before transferring control to the relocated program's _start. If the stack pointer given by the kernel is aligned (seems to be the case...?), why would it be misaligned by the time we get to _start in this example? And more importantly, wouldn't a program with musl startup code linked in fail to find stuff like argc, argv, envp, etc on the stack as a result?!

Also, when invoked with the glibc dynamic linker, rsp is aligned as expected.

I can reproduce it; I just can't explain it. I wonder if there's a bug somewhere here. Will investigate a bit more.

alexrp · 2024-07-24T00:49:14Z

@henke96 I brought this up on the musl mailing list. The main takeaway is here: https://www.openwall.com/lists/musl/2024/07/23/10 (see also the later messages in thread for more explanation on why the misalignment happens)

Basically, the issue you're seeing happens because you're expecting an ABI that you weren't promised. 🙂 The "process initialization" rules only formally apply to the case where the kernel transfers control to your program's entry point. In your case, it's kernel -> musl ldso -> your entry point, while musl is expecting kernel -> musl ldso -> musl crt1-compatible entry point. So, you're subject to whatever ABI musl has defined between its ldso and crt1. The same is true of glibc; it just so happens that it's also compatible with the System V ABI.

Unfortunately for my idea here, start.zig is also used in the case where we create a dynamically-linked executable, so we're subject to the whims of the dynamic linker. Thanks for helping clarify all this!

alexrp marked this pull request as ready for review July 22, 2024 19:01

alexrp commented Jul 22, 2024

View reviewed changes

alexrp marked this pull request as draft July 23, 2024 22:02

alexrp closed this Jul 24, 2024

alexrp deleted the start-stack-align branch July 24, 2024 04:44

alexrp mentioned this pull request Jul 24, 2024

start: Harden against program interpreters that don't adhere fully to the ABI #20777

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`start`: Avoid needlessly aligning the stack pointer on some architectures. #20737

`start`: Avoid needlessly aligning the stack pointer on some architectures. #20737

alexrp commented Jul 22, 2024 •

edited

Loading

alexrp commented Jul 22, 2024

alexrp Jul 22, 2024

alexrp Jul 22, 2024

alexrp Jul 22, 2024

alexrp Jul 22, 2024

jacobly0 commented Jul 22, 2024 •

edited

Loading

alexrp commented Jul 22, 2024 •

edited

Loading

alexrp commented Jul 22, 2024

henke96 commented Jul 23, 2024

alexrp commented Jul 23, 2024 •

edited

Loading

alexrp commented Jul 24, 2024 •

edited

Loading

start: Avoid needlessly aligning the stack pointer on some architectures. #20737

start: Avoid needlessly aligning the stack pointer on some architectures. #20737

Conversation

alexrp commented Jul 22, 2024 • edited Loading

alexrp commented Jul 22, 2024

alexrp Jul 22, 2024

Choose a reason for hiding this comment

alexrp Jul 22, 2024

Choose a reason for hiding this comment

alexrp Jul 22, 2024

Choose a reason for hiding this comment

alexrp Jul 22, 2024

Choose a reason for hiding this comment

jacobly0 commented Jul 22, 2024 • edited Loading

alexrp commented Jul 22, 2024 • edited Loading

alexrp commented Jul 22, 2024

henke96 commented Jul 23, 2024

alexrp commented Jul 23, 2024 • edited Loading

alexrp commented Jul 24, 2024 • edited Loading

`start`: Avoid needlessly aligning the stack pointer on some architectures. #20737

`start`: Avoid needlessly aligning the stack pointer on some architectures. #20737

alexrp commented Jul 22, 2024 •

edited

Loading

jacobly0 commented Jul 22, 2024 •

edited

Loading

alexrp commented Jul 22, 2024 •

edited

Loading

alexrp commented Jul 23, 2024 •

edited

Loading

alexrp commented Jul 24, 2024 •

edited

Loading