cortex-m-rt: ensure the stack is 8-byte aligned. #463

Dirbaio · 2023-02-11T01:15:47Z

Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so we push 2 words.

I'm not sure on the cfi changes, so I'd appreciate some double-check.

This was breaking code ~~that used LDRD/STRD on stack, since~~ that needs 8-byte alignment, For example:

#[inline(never)]
pub fn foo(x: bool) -> (u32, [u32; 2]) {
    let array = [3, 99];
    let result = array[x as usize];
    (result, array)
}

hprintln!("false -> {:?}", foo(false));
hprintln!("true -> {:?}", foo(true));

compiled to code that ~~does a strd on stack~~ does some weird address math with orr instead of add which is only correct if the stack is 8-byte aligned:

                             undefined foo()
             undefined         r0:1           <RETURN>
             undefined4        Stack[-0x8]:4  local_8                         
        00000448 82 b0           sub                sp,#0x8
        0000044a 6a 46           mov                r2,sp
        0000044c 4f f0 63 0c     mov.w              r12,#0x63
        00000450 03 23           movs               r3,#0x3
        00000452 42 ea 81 01     orr.w              r1,r2,r1, lsl #0x2
        00000456 cd e9 00 3c     strd               r3,r12,[sp,#0x0]=>local_8
        0000045a 09 68           ldr                r1,[r1,#0x0]
        0000045c 80 e8 0a 10     stm                r0,{r1,r3,r12}
        00000460 02 b0           add                sp,#0x8
        00000462 70 47           bx                 lr

which prints

false -> (3, [3, 99])
true -> (3, [3, 99])

With the fix in this PR, the result is correct:

false -> (3, [3, 99])
true -> (99, [3, 99])

adamgreig · 2023-02-11T01:21:57Z

Good catch, thanks. Maybe we should push r4 instead which is already loaded with 0xFFFFFFFF (= LR) and doesn't change depending on features, just to keep the top of the stack a little bit less weird.

Dirbaio · 2023-02-11T01:30:06Z

done

464: cortex-m-rt: assert in linker script that stack_start is 8-byte aligned. r=adamgreig a=Dirbaio If the user sets RAM length to something that's not multiple of 8, the stack won't be 8-byte aligned. This'll trigger the same horrible symptoms as #463 . This PR adds an assert to the linker script that enforces alignment. Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>

adamgreig · 2023-02-11T05:06:31Z

For historical context, we discussed this PR a bit on the chat, from here.

This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment

It's not the strd that's troublesome (that's fine with 4-byte alignment on armv7m, and doesn't exist on armv6m), but here it's the clever trick the optimiser is doing:

test_function(x: bool) -> (u32, [u32; 2])
r1 initially contains x

mov r12, 99             # r12 = 99
mov r3, 3               # r3  = 3
mov r2, sp              # r2  = SP
orr r1, r2, r1 lsl 2    # r1  = SP | (x << 2)
str r12, [sp, 4]        # *(SP + 4) = 99
str r3, [sp]            # *(SP + 0) = 3
ldr r1, [r1]            # r1 = *r1 = *(SP | (x << 2))

[r1 is now returned as the first element of the tuple, and r3 and r12 as the second]

The stack contains [3, 99], and r1 is loaded from either (SP) or (SP | 4). If SP is 8-byte aligned that's SP or SP+4, but if SP is only 4-byte aligned (so you happen to have a 1 in the 3rd bit), you end up with SP+4 returned either way, which is wrong. The optimisation is correct assuming SP is 8-byte aligned, which it must be at program entry according to AAPCS32, so that's the problem in this particular case.

In general it seems like this or similar optimisations could have resulted in incorrect program behaviour since cortex-m-rt 0.7.1 was released with 8bf70f5 (added in rust-embedded/cortex-m-rt#337), so for a bit over a year and including the current 0.7.2. Versions 0.7.0 and prior didn't push LR to the stack so it never became misaligned.

Right now my question is whether the best fix is to push another register to the stack (which is how a function would normally push LR to the stack at function-entry if it will itself write to LR), as per the current PR, or to remove pushing LR to the stack entirely, which would save 8 bytes of stack space and a few instructions in flash and at startup, but means unwinders have to work out when to stop without seeing 0xFFFFFFFF in LR.

From a brief survey, I couldn't find any other startup code that pushes LR before bl main, or indeed even initialises LR on ARMv6M platforms where it's undefined at startup:

ChibiOS
TockOS
Zephyr
libopencm3 (possibly slightly complicated by being written in C so might actually push the CPU's startup LR to the stack)
ST's HAL
Arm's demo startup code

So, I wonder if actually we're making life worse for ourselves here and could simplify things. I think the main motivation for this change was from probe-run (knurling-rs/probe-run#277) which implements its own unwinder that looks for 0xFFFFFFFF in LR to stop unwinding; whether there's any other way to sensibly detect this, or what other debuggers like gdb do, we'll need to investigate.

465: Enforce 8-byte initial stack pointer alignment r=adamgreig a=adamgreig After #463 we discovered that adding a second linker script via another compiler flag could be used to override `_stack_start` without triggering the assert in the main linker script. By masking the value, we force alignment even when the assert doesn't otherwise trigger. Co-authored-by: Adam Greig <adam@adamgreig.com>

datdenkikniet · 2023-02-13T09:30:22Z

It seems that GDB also uses the 0xFFFF_FFFF value as a base-case when attempting to unwind, and gives a slightly more sophisticated warning ("previous frame identical to this frame") if it can't find that value. See here and here, and in gdb source here. Fixing the alignment but keeping the push may make the most sense.

cortex-m-rt/src/lib.rs

jamesmunns · 2023-02-13T10:27:06Z

Just want to confirm, I've used this PR branch to re-run with the reproduction case, and the patch fixes the defect as expected.

link to my patch. Prior to this commit in the original repro repo, I was able to reproduce the failing assert.

Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so we push 2 words. This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment.

Co-authored-by: James Munns <james@onevariable.com>

adamgreig · 2023-02-13T23:51:55Z

I'm convinced that this PR fixes the stack alignment issue and thus fixes the miscompilation problem, and it's important we get a fix released soon. The fix in this PR (push r4 as well as lr to the stack before calling main) preserves the current unwinding functionality in probe-run and gdb (I haven't tried any other debuggers).

An alternative is to remove the stack frame entirely (don't set LR to FFFFFFFF twice, don't push 8 bytes to the stack, possibly don't specify the CFI directives). This simplifies the startup code and saves some flash and stack space, and brings us in-line with all the other startup code behaviour, but causes probe-run to emit a warning or error when its unwinding fails to find an FFFFFFFF sentinel value (it still displays the full stack trace, so there's no loss of functionality). gdb is not affected by default (it detects the main function and stops unwinding past that; if you ask it to unwind further it stops after reset with a message about a duplicate stack frame).

I think ideally probe-run should be able to stop unwinding without seeing the FFFFFFFF value; it's not part of the ARM spec, it's not even the start value of LR on ARMv6M, and there are a bunch of scenarios where it won't be observed including anything not using cortex-m-rt's particular startup code.

Output of gdb and probe-run below:

GDB

With 0.7.2
With this PR

(gdb) bt
#0  0x08000244 in lib::inline::__delay (cyc=<optimised out>) at asm/inline.rs:62
#1  lib::__delay (cyc=<optimised out>) at asm/lib.rs:51
#2  0x0800022e in cortex_m::asm::delay () at /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19
#3  f4::__cortex_m_rt_main () at src/main.rs:16
#4  0x080001e4 in f4::__cortex_m_rt_main_trampoline () at src/main.rs:7

(gdb) bt -past-main on
#0  0x08000244 in lib::inline::__delay (cyc=<optimised out>) at asm/inline.rs:62
#1  lib::__delay (cyc=<optimised out>) at asm/lib.rs:51
#2  0x0800022e in cortex_m::asm::delay () at /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19
#3  f4::__cortex_m_rt_main () at src/main.rs:16
#4  0x080001e4 in f4::__cortex_m_rt_main_trampoline () at src/main.rs:7
#5  0x080001c4 in Reset ()

With the stack frame pushing removed
With the stack frame pushing and CFI directives removed

(gdb) bt
<same as before>
(gdb) bt -past-main on
<same as before until...>
#5  0x080001c2 in Reset ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

probe-run

With 0.7.2
With this PR

stack backtrace:
   0: lib::inline::__delay
        at ./asm/inline.rs:62:5
   1: __delay
        at ./asm/lib.rs:51:17
   2: cortex_m::asm::delay
        at /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19:21
   3: f4::__cortex_m_rt_main
        at src/main.rs:18:13
   4: main
        at src/main.rs:7:1
   5: Reset
(HOST) INFO  device halted by user

With the stack frame pushing removed, but keeping CFI directives

stack backtrace:
<same as before until...>
   3: f4::__cortex_m_rt_main
        at src/main.rs:18:13
   4: main
        at src/main.rs:7:1
   5: Reset
(HOST) WARN  call stack was corrupted; unwinding could not be completed
(HOST) INFO  device halted by user

With the stack frame pushing and CFI directives removed

stack backtrace:
<same as before until...>
   3: main
        at src/main.rs:7:1
   4: Reset
(HOST) ERROR error occurred during backtrace creation: debug information for address 0x80001c2 is missing. Likely fixes:
        1. compile the Rust code with `debug = 1` or higher. This is configured in the `profile.{release,bench}` sections of Cargo.toml (`profile.{dev,test}` default to `debug = 2`)
        2. use a recent version of the `cortex-m` crates (e.g. cortex-m 0.6.3 or newer). Check versions in Cargo.lock
        3. if linking to C code, compile the C code with the `-g` flag

Caused by:
    Do not have unwind info for the given address.
               the backtrace may be incomplete.
(HOST) INFO  device halted by user

Next steps

I'd like to get a fixed release out ASAP, and I'd like to remove the stack frame being pushed from Reset sooner or later, because it doesn't seem like it should be there and I think the simpler the startup code the better. Given that, I think we have three options:

Release this PR now, and leave the startup like this. probe-run keeps working without changes.
Release this PR now, and in a month, release a new version that removes the stack frame. probe-run has a month to fix unwinding before users start getting warnings.
Release a version without stack frames now (cortex-m-rt: Remove LR push, to ensure the stack is 8-byte aligned. #467). probe-run users immediately see errors, but we go directly to the version we'd like to end up with.

Ideally I'd like a second opinion from @rust-embedded/cortex-m team member(s) - any thoughts?

newAM · 2023-02-13T23:56:45Z

Thanks for summarizing that so well!

Ideally I'd like a second opinion from https://github.com/orgs/rust-embedded/teams/cortex-m team member(s) - any thoughts?

I like option 2 or 3. The stack frame should be removed at some point. I do not have opinions on doing it now or later - there's good reasons for both.

adamgreig · 2023-02-14T01:33:32Z

probe-rs

0.7.2
this PR (push {r4, lr}, .cfi_offset lr, 4)
remove LR push, remove CFI directives too

>> bt
Frame 0: __delay @ 0x08000244 inline
       ./asm/inline.rs:62:5
Frame 1: __delay @ 0x000000000800023e
       ./asm/lib.rs:51:17
Frame 2: delay @ 0x0800022e inline
       /home/adam/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.7/src/call_asm.rs:19:21
Frame 3: __cortex_m_rt_main @ 0x000000000800022e
       /home/adam/Projects/tmp/f4/src/main.rs:18:13
Frame 4: __cortex_m_rt_main_trampoline @ 0x080001e4
       /home/adam/Projects/tmp/f4/src/main.rs:7:1
Frame 5: <unknown function @ 0x080001c4> @ 0x080001c4

remove LR push entirely, but keep CFI directives
remove LR push, replace with mov r7, #0, keep CFI

<same until...>
Frame 4: __cortex_m_rt_main_trampoline @ 0x080001e0
       /home/adam/Projects/tmp/f4/src/main.rs:7:1
Frame 5: <unknown function @ 0x080001c2> @ 0x080001c2
Frame 6: <unknown function @ 0x080001c2> @ 0x080001c2

Basically it just handles all the situations pretty well (although it doesn't pick up the name 'Reset'). I think probe-run could potentially swap to using probe-rs for unwinding (it already uses it for debugger access) to resolve the issue too.

adamgreig · 2023-02-14T01:48:08Z

Closing in favour of #467, see #467 (review)

467: cortex-m-rt: Remove LR push, to ensure the stack is 8-byte aligned. r=adamgreig a=Dirbaio This was causing incorrect execution of code optimized with the assumption the stack is 8-byte aligned. Alternate version of #463 - Remove instead of fix the sentinel/fake frame. - Remove code initializing LR, since it's now clobbered by the `bl main` anyway. - ~~Remove the .cfi directives, since Reset now has no correct CFI info. I think this is the "correct" thing to do here.~~ - ~~Initialize the frame pointer in R7 (suggestion from `@jamesmunns)~~` Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>

Dirbaio requested a review from a team as a code owner February 11, 2023 01:15

Dirbaio force-pushed the rt-align-stack branch from cf9adf4 to 5da2003 Compare February 11, 2023 01:29

Dirbaio mentioned this pull request Feb 11, 2023

cortex-m-rt: assert in linker script that stack_start is 8-byte aligned. #464

Merged

adamgreig mentioned this pull request Feb 11, 2023

Enforce 8-byte initial stack pointer alignment #465

Merged

jamesmunns reviewed Feb 13, 2023

View reviewed changes

cortex-m-rt/src/lib.rs Outdated Show resolved Hide resolved

jamesmunns mentioned this pull request Feb 13, 2023

Consider removing the need for a "sentinel stack frame" while unwinding knurling-rs/probe-run#382

Closed

Dirbaio and others added 2 commits February 14, 2023 00:31

cortex-m-rt: ensure the stack is 8-byte aligned.

2bb8bc3

Stack must be 8-byte aligned on ARM. Pushing 1 word makes it not aligned, so we push 2 words. This was breaking code that used LDRD/STRD on stack, since that needs 8-byte alignment.

Update cortex-m-rt/src/lib.rs

6681bc7

Co-authored-by: James Munns <james@onevariable.com>

Dirbaio force-pushed the rt-align-stack branch from 42bc1a2 to 6681bc7 Compare February 13, 2023 23:31

Dirbaio mentioned this pull request Feb 13, 2023

cortex-m-rt: Remove LR push, to ensure the stack is 8-byte aligned. #467

Merged

adamgreig mentioned this pull request Feb 14, 2023

cortex-m-rt: initialize the frame pointer chain. #468

Closed

adamgreig closed this Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cortex-m-rt: ensure the stack is 8-byte aligned. #463

cortex-m-rt: ensure the stack is 8-byte aligned. #463

Dirbaio commented Feb 11, 2023 •

edited

Loading

adamgreig commented Feb 11, 2023

Dirbaio commented Feb 11, 2023

adamgreig commented Feb 11, 2023

datdenkikniet commented Feb 13, 2023 •

edited

Loading

jamesmunns commented Feb 13, 2023

adamgreig commented Feb 13, 2023

newAM commented Feb 13, 2023 •

edited

Loading

adamgreig commented Feb 14, 2023 •

edited

Loading

adamgreig commented Feb 14, 2023

cortex-m-rt: ensure the stack is 8-byte aligned. #463

cortex-m-rt: ensure the stack is 8-byte aligned. #463

Conversation

Dirbaio commented Feb 11, 2023 • edited Loading

adamgreig commented Feb 11, 2023

Dirbaio commented Feb 11, 2023

adamgreig commented Feb 11, 2023

datdenkikniet commented Feb 13, 2023 • edited Loading

jamesmunns commented Feb 13, 2023

adamgreig commented Feb 13, 2023

GDB

probe-run

Next steps

newAM commented Feb 13, 2023 • edited Loading

adamgreig commented Feb 14, 2023 • edited Loading

probe-rs

adamgreig commented Feb 14, 2023

Dirbaio commented Feb 11, 2023 •

edited

Loading

datdenkikniet commented Feb 13, 2023 •

edited

Loading

newAM commented Feb 13, 2023 •

edited

Loading

adamgreig commented Feb 14, 2023 •

edited

Loading