-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cortex-m-rt: ensure the stack is 8-byte aligned. #463
Conversation
Good catch, thanks. Maybe we should push |
cf9adf4
to
5da2003
Compare
done |
464: cortex-m-rt: assert in linker script that stack_start is 8-byte aligned. r=adamgreig a=Dirbaio If the user sets RAM length to something that's not multiple of 8, the stack won't be 8-byte aligned. This'll trigger the same horrible symptoms as #463 . This PR adds an assert to the linker script that enforces alignment. Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
For historical context, we discussed this PR a bit on the chat, from here.
It's not the
The stack contains [3, 99], and r1 is loaded from either (SP) or (SP | 4). If SP is 8-byte aligned that's SP or SP+4, but if SP is only 4-byte aligned (so you happen to have a 1 in the 3rd bit), you end up with SP+4 returned either way, which is wrong. The optimisation is correct assuming SP is 8-byte aligned, which it must be at program entry according to AAPCS32, so that's the problem in this particular case. In general it seems like this or similar optimisations could have resulted in incorrect program behaviour since cortex-m-rt 0.7.1 was released with 8bf70f5 (added in rust-embedded/cortex-m-rt#337), so for a bit over a year and including the current 0.7.2. Versions 0.7.0 and prior didn't push LR to the stack so it never became misaligned. Right now my question is whether the best fix is to push another register to the stack (which is how a function would normally push LR to the stack at function-entry if it will itself write to LR), as per the current PR, or to remove pushing LR to the stack entirely, which would save 8 bytes of stack space and a few instructions in flash and at startup, but means unwinders have to work out when to stop without seeing 0xFFFFFFFF in LR. From a brief survey, I couldn't find any other startup code that pushes LR before
So, I wonder if actually we're making life worse for ourselves here and could simplify things. I think the main motivation for this change was from |
465: Enforce 8-byte initial stack pointer alignment r=adamgreig a=adamgreig After #463 we discovered that adding a second linker script via another compiler flag could be used to override `_stack_start` without triggering the assert in the main linker script. By masking the value, we force alignment even when the assert doesn't otherwise trigger. Co-authored-by: Adam Greig <adam@adamgreig.com>
It seems that GDB also uses the |
Just want to confirm, I've used this PR branch to re-run with the reproduction case, and the patch fixes the defect as expected. link to my patch. Prior to this commit in the original repro repo, I was able to reproduce the failing assert. |
Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so we push 2 words. This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment.
Co-authored-by: James Munns <james@onevariable.com>
42bc1a2
to
6681bc7
Compare
I'm convinced that this PR fixes the stack alignment issue and thus fixes the miscompilation problem, and it's important we get a fix released soon. The fix in this PR (push r4 as well as lr to the stack before calling main) preserves the current unwinding functionality in An alternative is to remove the stack frame entirely (don't set LR to FFFFFFFF twice, don't push 8 bytes to the stack, possibly don't specify the CFI directives). This simplifies the startup code and saves some flash and stack space, and brings us in-line with all the other startup code behaviour, but causes I think ideally probe-run should be able to stop unwinding without seeing the FFFFFFFF value; it's not part of the ARM spec, it's not even the start value of LR on ARMv6M, and there are a bunch of scenarios where it won't be observed including anything not using cortex-m-rt's particular startup code. Output of gdb and probe-run below: GDB
probe-run
Next stepsI'd like to get a fixed release out ASAP, and I'd like to remove the stack frame being pushed from Reset sooner or later, because it doesn't seem like it should be there and I think the simpler the startup code the better. Given that, I think we have three options:
Ideally I'd like a second opinion from @rust-embedded/cortex-m team member(s) - any thoughts? |
Thanks for summarizing that so well!
I like option 2 or 3. The stack frame should be removed at some point. I do not have opinions on doing it now or later - there's good reasons for both. |
probe-rs
Basically it just handles all the situations pretty well (although it doesn't pick up the name 'Reset'). I think probe-run could potentially swap to using probe-rs for unwinding (it already uses it for debugger access) to resolve the issue too. |
Closing in favour of #467, see #467 (review) |
467: cortex-m-rt: Remove LR push, to ensure the stack is 8-byte aligned. r=adamgreig a=Dirbaio This was causing incorrect execution of code optimized with the assumption the stack is 8-byte aligned. Alternate version of #463 - Remove instead of fix the sentinel/fake frame. - Remove code initializing LR, since it's now clobbered by the `bl main` anyway. - ~~Remove the .cfi directives, since Reset now has no correct CFI info. I think this is the "correct" thing to do here.~~ - ~~Initialize the frame pointer in R7 (suggestion from `@jamesmunns)~~` Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so we push 2 words.
I'm not sure on the
cfi
changes, so I'd appreciate some double-check.This was breaking code
that used LDRD/STRD on stack, sincethat needs 8-byte alignment, For example:compiled to code that
does adoes some weird address math withstrd
on stackorr
instead ofadd
which is only correct if the stack is 8-byte aligned:which prints
With the fix in this PR, the result is correct: