Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cortex-m-rt: ensure the stack is 8-byte aligned. #463

Closed
wants to merge 2 commits into from

Conversation

Dirbaio
Copy link
Member

@Dirbaio Dirbaio commented Feb 11, 2023

Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so we push 2 words.

I'm not sure on the cfi changes, so I'd appreciate some double-check.


This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment, For example:

#[inline(never)]
pub fn foo(x: bool) -> (u32, [u32; 2]) {
    let array = [3, 99];
    let result = array[x as usize];
    (result, array)
}

hprintln!("false -> {:?}", foo(false));
hprintln!("true -> {:?}", foo(true));

compiled to code that does a strd on stack does some weird address math with orr instead of add which is only correct if the stack is 8-byte aligned:

                             undefined foo()
             undefined         r0:1           <RETURN>
             undefined4        Stack[-0x8]:4  local_8                         
        00000448 82 b0           sub                sp,#0x8
        0000044a 6a 46           mov                r2,sp
        0000044c 4f f0 63 0c     mov.w              r12,#0x63
        00000450 03 23           movs               r3,#0x3
        00000452 42 ea 81 01     orr.w              r1,r2,r1, lsl #0x2
        00000456 cd e9 00 3c     strd               r3,r12,[sp,#0x0]=>local_8
        0000045a 09 68           ldr                r1,[r1,#0x0]
        0000045c 80 e8 0a 10     stm                r0,{r1,r3,r12}
        00000460 02 b0           add                sp,#0x8
        00000462 70 47           bx                 lr

which prints

false -> (3, [3, 99])
true -> (3, [3, 99])

With the fix in this PR, the result is correct:

false -> (3, [3, 99])
true -> (99, [3, 99])

@Dirbaio Dirbaio requested a review from a team as a code owner February 11, 2023 01:15
@adamgreig
Copy link
Member

Good catch, thanks. Maybe we should push r4 instead which is already loaded with 0xFFFFFFFF (= LR) and doesn't change depending on features, just to keep the top of the stack a little bit less weird.

@Dirbaio
Copy link
Member Author

Dirbaio commented Feb 11, 2023

done

bors bot added a commit that referenced this pull request Feb 11, 2023
464: cortex-m-rt: assert in linker script that stack_start is 8-byte aligned. r=adamgreig a=Dirbaio

If the user sets RAM length to something that's not multiple of 8, the stack won't be 8-byte aligned. This'll trigger the same horrible symptoms as #463 .

This PR adds an assert to the linker script that enforces alignment.

Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
@adamgreig
Copy link
Member

For historical context, we discussed this PR a bit on the chat, from here.

This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment

It's not the strd that's troublesome (that's fine with 4-byte alignment on armv7m, and doesn't exist on armv6m), but here it's the clever trick the optimiser is doing:

test_function(x: bool) -> (u32, [u32; 2])
r1 initially contains x

mov r12, 99             # r12 = 99
mov r3, 3               # r3  = 3
mov r2, sp              # r2  = SP
orr r1, r2, r1 lsl 2    # r1  = SP | (x << 2)
str r12, [sp, 4]        # *(SP + 4) = 99
str r3, [sp]            # *(SP + 0) = 3
ldr r1, [r1]            # r1 = *r1 = *(SP | (x << 2))

[r1 is now returned as the first element of the tuple, and r3 and r12 as the second]

The stack contains [3, 99], and r1 is loaded from either (SP) or (SP | 4). If SP is 8-byte aligned that's SP or SP+4, but if SP is only 4-byte aligned (so you happen to have a 1 in the 3rd bit), you end up with SP+4 returned either way, which is wrong. The optimisation is correct assuming SP is 8-byte aligned, which it must be at program entry according to AAPCS32, so that's the problem in this particular case.

In general it seems like this or similar optimisations could have resulted in incorrect program behaviour since cortex-m-rt 0.7.1 was released with 8bf70f5 (added in rust-embedded/cortex-m-rt#337), so for a bit over a year and including the current 0.7.2. Versions 0.7.0 and prior didn't push LR to the stack so it never became misaligned.

Right now my question is whether the best fix is to push another register to the stack (which is how a function would normally push LR to the stack at function-entry if it will itself write to LR), as per the current PR, or to remove pushing LR to the stack entirely, which would save 8 bytes of stack space and a few instructions in flash and at startup, but means unwinders have to work out when to stop without seeing 0xFFFFFFFF in LR.

From a brief survey, I couldn't find any other startup code that pushes LR before bl main, or indeed even initialises LR on ARMv6M platforms where it's undefined at startup:

So, I wonder if actually we're making life worse for ourselves here and could simplify things. I think the main motivation for this change was from probe-run (knurling-rs/probe-run#277) which implements its own unwinder that looks for 0xFFFFFFFF in LR to stop unwinding; whether there's any other way to sensibly detect this, or what other debuggers like gdb do, we'll need to investigate.

bors bot added a commit that referenced this pull request Feb 12, 2023
465: Enforce 8-byte initial stack pointer alignment r=adamgreig a=adamgreig

After #463 we discovered that adding a second linker script via another compiler flag could be used to override `_stack_start` without triggering the assert in the main linker script. By masking the value, we force alignment even when the assert doesn't otherwise trigger.

Co-authored-by: Adam Greig <adam@adamgreig.com>
@datdenkikniet
Copy link

datdenkikniet commented Feb 13, 2023

It seems that GDB also uses the 0xFFFF_FFFF value as a base-case when attempting to unwind, and gives a slightly more sophisticated warning ("previous frame identical to this frame") if it can't find that value. See here and here, and in gdb source here. Fixing the alignment but keeping the push may make the most sense.

cortex-m-rt/src/lib.rs Outdated Show resolved Hide resolved
@jamesmunns
Copy link
Member

Just want to confirm, I've used this PR branch to re-run with the reproduction case, and the patch fixes the defect as expected.

link to my patch. Prior to this commit in the original repro repo, I was able to reproduce the failing assert.

Dirbaio and others added 2 commits February 14, 2023 00:31
Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so
we push 2 words.

This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment.
Co-authored-by: James Munns <james@onevariable.com>
@adamgreig
Copy link
Member

I'm convinced that this PR fixes the stack alignment issue and thus fixes the miscompilation problem, and it's important we get a fix released soon. The fix in this PR (push r4 as well as lr to the stack before calling main) preserves the current unwinding functionality in probe-run and gdb (I haven't tried any other debuggers).

An alternative is to remove the stack frame entirely (don't set LR to FFFFFFFF twice, don't push 8 bytes to the stack, possibly don't specify the CFI directives). This simplifies the startup code and saves some flash and stack space, and brings us in-line with all the other startup code behaviour, but causes probe-run to emit a warning or error when its unwinding fails to find an FFFFFFFF sentinel value (it still displays the full stack trace, so there's no loss of functionality). gdb is not affected by default (it detects the main function and stops unwinding past that; if you ask it to unwind further it stops after reset with a message about a duplicate stack frame).

I think ideally probe-run should be able to stop unwinding without seeing the FFFFFFFF value; it's not part of the ARM spec, it's not even the start value of LR on ARMv6M, and there are a bunch of scenarios where it won't be observed including anything not using cortex-m-rt's particular startup code.

Output of gdb and probe-run below:

GDB

  • With 0.7.2
  • With this PR
(gdb) bt
#0  0x08000244 in lib::inline::__delay (cyc=<optimised out>) at asm/inline.rs:62
#1  lib::__delay (cyc=<optimised out>) at asm/lib.rs:51
#2  0x0800022e in cortex_m::asm::delay () at /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19
#3  f4::__cortex_m_rt_main () at src/main.rs:16
#4  0x080001e4 in f4::__cortex_m_rt_main_trampoline () at src/main.rs:7

(gdb) bt -past-main on
#0  0x08000244 in lib::inline::__delay (cyc=<optimised out>) at asm/inline.rs:62
#1  lib::__delay (cyc=<optimised out>) at asm/lib.rs:51
#2  0x0800022e in cortex_m::asm::delay () at /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19
#3  f4::__cortex_m_rt_main () at src/main.rs:16
#4  0x080001e4 in f4::__cortex_m_rt_main_trampoline () at src/main.rs:7
#5  0x080001c4 in Reset ()
  • With the stack frame pushing removed
  • With the stack frame pushing and CFI directives removed
(gdb) bt
<same as before>
(gdb) bt -past-main on
<same as before until...>
#5  0x080001c2 in Reset ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

probe-run

  • With 0.7.2
  • With this PR
stack backtrace:
   0: lib::inline::__delay
        at ./asm/inline.rs:62:5
   1: __delay
        at ./asm/lib.rs:51:17
   2: cortex_m::asm::delay
        at /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19:21
   3: f4::__cortex_m_rt_main
        at src/main.rs:18:13
   4: main
        at src/main.rs:7:1
   5: Reset
(HOST) INFO  device halted by user
  • With the stack frame pushing removed, but keeping CFI directives
stack backtrace:
<same as before until...>
   3: f4::__cortex_m_rt_main
        at src/main.rs:18:13
   4: main
        at src/main.rs:7:1
   5: Reset
(HOST) WARN  call stack was corrupted; unwinding could not be completed
(HOST) INFO  device halted by user
  • With the stack frame pushing and CFI directives removed
stack backtrace:
<same as before until...>
   3: main
        at src/main.rs:7:1
   4: Reset
(HOST) ERROR error occurred during backtrace creation: debug information for address 0x80001c2 is missing. Likely fixes:
        1. compile the Rust code with `debug = 1` or higher. This is configured in the `profile.{release,bench}` sections of Cargo.toml (`profile.{dev,test}` default to `debug = 2`)
        2. use a recent version of the `cortex-m` crates (e.g. cortex-m 0.6.3 or newer). Check versions in Cargo.lock
        3. if linking to C code, compile the C code with the `-g` flag

Caused by:
    Do not have unwind info for the given address.
               the backtrace may be incomplete.
(HOST) INFO  device halted by user

Next steps

I'd like to get a fixed release out ASAP, and I'd like to remove the stack frame being pushed from Reset sooner or later, because it doesn't seem like it should be there and I think the simpler the startup code the better. Given that, I think we have three options:

  1. Release this PR now, and leave the startup like this. probe-run keeps working without changes.
  2. Release this PR now, and in a month, release a new version that removes the stack frame. probe-run has a month to fix unwinding before users start getting warnings.
  3. Release a version without stack frames now (cortex-m-rt: Remove LR push, to ensure the stack is 8-byte aligned. #467). probe-run users immediately see errors, but we go directly to the version we'd like to end up with.

Ideally I'd like a second opinion from @rust-embedded/cortex-m team member(s) - any thoughts?

@newAM
Copy link
Member

newAM commented Feb 13, 2023

Thanks for summarizing that so well!

Ideally I'd like a second opinion from https://github.com/orgs/rust-embedded/teams/cortex-m team member(s) - any thoughts?

I like option 2 or 3. The stack frame should be removed at some point. I do not have opinions on doing it now or later - there's good reasons for both.

@adamgreig
Copy link
Member

adamgreig commented Feb 14, 2023

probe-rs

  • 0.7.2
  • this PR (push {r4, lr}, .cfi_offset lr, 4)
  • remove LR push, remove CFI directives too
>> bt
Frame 0: __delay @ 0x08000244 inline
       ./asm/inline.rs:62:5
Frame 1: __delay @ 0x000000000800023e
       ./asm/lib.rs:51:17
Frame 2: delay @ 0x0800022e inline
       /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19:21
Frame 3: __cortex_m_rt_main @ 0x000000000800022e
       /home/adam/Projects/tmp/f4/src/main.rs:18:13
Frame 4: __cortex_m_rt_main_trampoline @ 0x080001e4
       /home/adam/Projects/tmp/f4/src/main.rs:7:1
Frame 5: <unknown function @ 0x080001c4> @ 0x080001c4
  • remove LR push entirely, but keep CFI directives
  • remove LR push, replace with mov r7, #0, keep CFI
<same until...>
Frame 4: __cortex_m_rt_main_trampoline @ 0x080001e0
       /home/adam/Projects/tmp/f4/src/main.rs:7:1
Frame 5: <unknown function @ 0x080001c2> @ 0x080001c2
Frame 6: <unknown function @ 0x080001c2> @ 0x080001c2

Basically it just handles all the situations pretty well (although it doesn't pick up the name 'Reset'). I think probe-run could potentially swap to using probe-rs for unwinding (it already uses it for debugger access) to resolve the issue too.

@adamgreig
Copy link
Member

Closing in favour of #467, see #467 (review)

@adamgreig adamgreig closed this Feb 14, 2023
bors bot added a commit that referenced this pull request Feb 14, 2023
467: cortex-m-rt: Remove LR push, to ensure the stack is 8-byte aligned. r=adamgreig a=Dirbaio

This was causing incorrect execution of code optimized with the assumption the stack is 8-byte aligned.

Alternate version of #463
- Remove instead of fix the sentinel/fake frame.
- Remove code initializing LR, since it's now clobbered by the `bl main` anyway.
- ~~Remove the .cfi directives, since Reset now has no correct CFI info. I think this is the "correct" thing to do here.~~
- ~~Initialize the frame pointer in R7 (suggestion from `@jamesmunns)~~`

Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants