False positive "call stack was corrupted" warning on 0.3 #277

jonas-schievink · 2021-11-15T15:26:04Z

Describe the bug
I'm seeing a (HOST) WARN call stack was corrupted; unwinding could not be completed, even though unwinding seems to have worked fine and nothing happened that could have corrupted the stack.

To Reproduce
Steps to reproduce the behavior:

Check out Add support for defmt 0.3 stm32-rs/bxcan#47
cd testsuite; cargo test

Expected and observed behavior

Actual output:

    Finished test [unoptimized + debuginfo] target(s) in 0.01s
     Running tests/integration.rs (/home/jonas/dev/bxcan/target/thumbv7m-none-eabi/debug/deps/integration-7c7caa16bcb9d813)
(HOST) INFO  flashing program (16.20 KiB)
(HOST) INFO  success!
────────────────────────────────────────────────────────────────────────────────
(1/11) running `split_filters`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:18
(2/11) running `basic_roundtrip`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:43
(3/11) running `no_filters_no_frames`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:58
(4/11) running `filter_mask32_std`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:68
(5/11) running `filter_mask32_ext`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:112
(6/11) running `filter_mask16`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:153
(7/11) running `filter_list32_std`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:193
(8/11) running `filter_list32_ext`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:226
(9/11) running `dequeue_lower_priority_frame`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:264
(10/11) running `enable_non_blocking`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:317
(11/11) running `ext_roundtrip`...
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:330
all tests passed!
└─ integration::tests::__defmt_test_entry @ tests/integration.rs:4
────────────────────────────────────────────────────────────────────────────────
stack backtrace:
   0: _ZN3lib6inline6__bkpt17h78c2a3232bb722b9E
        at ./asm/inline.rs:13:5
   1: __bkpt
        at ./asm/lib.rs:49:17
   2: _ZN10defmt_test6export4exit17h45e5ec6423f16273E
        at /home/jonas/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/defmt-test-0.3.0/src/export.rs:8:9
   3: main
        at tests/integration.rs:4:1
   4: Reset
(HOST) WARN  call stack was corrupted; unwinding could not be completed
(HOST) INFO  device halted without error
     Running tests/interrupts.rs (/home/jonas/dev/bxcan/target/thumbv7m-none-eabi/debug/deps/interrupts-fc17b8c8f4ece88b)
(HOST) INFO  flashing program (19.87 KiB)
(HOST) INFO  success!
────────────────────────────────────────────────────────────────────────────────
(1/6) running `tx_interrupt`...
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:37
(2/6) running `rx_interrupt_message_pending`...
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:71
(3/6) running `rx_interrupt_fifo_full`...
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:105
(4/6) running `rx_interrupt_fifo_overrun`...
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:145
(5/6) running `sce_interrupt_sleep`...
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:195
(6/6) running `sce_interrupt_wakeup`...
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:219
all tests passed!
└─ interrupts::tests::__defmt_test_entry @ tests/interrupts.rs:4
────────────────────────────────────────────────────────────────────────────────
stack backtrace:
   0: _ZN3lib6inline6__bkpt17h78c2a3232bb722b9E
        at ./asm/inline.rs:13:5
   1: __bkpt
        at ./asm/lib.rs:49:17
   2: _ZN10defmt_test6export4exit17h45e5ec6423f16273E
        at /home/jonas/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/defmt-test-0.3.0/src/export.rs:8:9
   3: main
        at tests/interrupts.rs:4:1
   4: Reset
(HOST) WARN  call stack was corrupted; unwinding could not be completed
(HOST) INFO  device halted without error

Expected output: the same, but without the warning.

The text was updated successfully, but these errors were encountered:

newAM · 2021-11-15T15:38:59Z

I am also seeing this in the stm32wlxx-hal testsuite: https://github.com/stm32-rs/stm32wlxx-hal/runs/4208369937?check_suite_focus=true#step:6:33

I saw this infrequently with defmt 0.2 (difficult to reproduce, so I didn't file a bug report), but now it occurs every time with 0.3.

jamesmunns · 2021-11-22T11:51:35Z

I'm also hitting this on every run, along with #201.

james@archx1c6g-ovar ➜  emb-playground git:(main) ✗ DEFMT_LOG=info cargo rrb forth
   Compiling emb-playground v0.1.0 (/home/james/anachro/forth/emb-playground)
    Finished release [optimized + debuginfo] target(s) in 1.45s
     Running `probe-run --chip nRF52840_xxAA target/thumbv7em-none-eabihf/release/forth`
(HOST) INFO  flashing program (17.36 KiB)
(HOST) INFO  success!
────────────────────────────────────────────────────────────────────────────────
INFO  Hello!
└─ forth::__cortex_m_rt_main @ src/bin/forth.rs:11
INFO  lol: 10781266
└─ forth::__cortex_m_rt_main @ src/bin/forth.rs:30
────────────────────────────────────────────────────────────────────────────────
stack backtrace:
   0: _ZN3lib6inline6__bkpt17h78c2a3232bb722b9E
        at ./asm/inline.rs:13:5
   1: __bkpt
        at ./asm/lib.rs:49:17
   2: _ZN14emb_playground4exit17h99c2b316dcb70511E
        at src/lib.rs:18:5
   3: _ZN5forth18__cortex_m_rt_main17h8a5fd1dcc9faec7bE
        at src/bin/forth.rs:32:5
   4: main
        at src/bin/forth.rs:9:1
   5: Reset
(HOST) WARN  call stack was corrupted; unwinding could not be completed
(HOST) INFO  device halted without error

Running this application, with cargo rrb forth

337: Fix unwinding through `Reset` r=thejpster a=jonas-schievink Unwinders may detect the end of the program by seeing `0xFFFFFFFF` in `lr`, which is why code to ensure that it has that value was added in #293. However, the `bl main` overwrites that value with the current program counter. This PR saves the old `lr` value on the stack, and adds debuginfo entries to allow an external unwinder to restore the value. This fixes knurling-rs/probe-run#277 Co-authored-by: Jonas Schievink <jonasschievink@gmail.com>

jonathanpallant · 2021-11-24T09:52:37Z

A fix for this issue has gone into cortex-m-rt, but you won't see it until a) there's a new release (likely 0.7.1) or b) add a patch to Cargo.toml to use the git version.

jonathanpallant · 2021-11-25T09:33:24Z

https://crates.io/crates/cortex-m-rt/0.7.1 is now out.

This resolves the corrupted stack backtrace in the testsuite. See knurling-rs/probe-run#277

287: unwind: skip FDEs with initial address of 0 r=jonathanpallant a=jonas-schievink The [`panic` example in the app-template](https://github.com/knurling-rs/app-template/blob/09cb051ef0e6863e805a3fc65d509147b031f411/src/bin/panic.rs) can cause a broken backtrace like this: ``` stack backtrace: 0: 0x00001450 @ HardFaultTrampoline <exception entry> 1: 0x000008d4 @ lib::inline::__udf at ./asm/inline.rs:172:5 2: 0x000008d4 @ __udf at ./asm/lib.rs:49:17 3: 0x000001ba @ cortex_m::asm::udf at [...]/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/cortex-m-0.7.3/src/asm.rs:43:5 4: 0x000001c4 @ _defmt_panic at [...]/src/lib.rs:13:5 5: 0x000001b0 @ defmt::export::panic at [...]/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/defmt-0.3.0/src/export/mod.rs:125:14 6: 0x000001b0 @ panic::__cortex_m_rt_main at [...]/src/bin/panic.rs:10:5 7: 0x00000168 @ main at [...]/src/bin/panic.rs:6:1 8: 0x00000144 @ Reset 9: 0x00000100 @ Reset 10: 0x52205244 @ __sheap ``` This appeared to have a different cause from #277, since the fix for that didn't fix the backtrace here. After a lot of investigation, it turned out that what was going on is that there are multiple frame description entries (FDEs) in the debuginfo that claim to apply to some addresses, and we happened to pick the wrong one while unwinding from `Reset` in frame 8, causing us to restore registers incorrectly and then attempting to jump to a nonsense address. This seems to be limited to FDEs describing presumably dead code, whose base address gets reset to 0 when they're included in the final ELF, so this PR fixes it by ignoring any FDEs that claim to apply to address 0. Co-authored-by: Jonas Schievink <jonasschievink@gmail.com>

337: Fix unwinding through `Reset` r=thejpster a=jonas-schievink Unwinders may detect the end of the program by seeing `0xFFFFFFFF` in `lr`, which is why code to ensure that it has that value was added in rust-embedded/cortex-m-rt#293. However, the `bl main` overwrites that value with the current program counter. This PR saves the old `lr` value on the stack, and adds debuginfo entries to allow an external unwinder to restore the value. This fixes knurling-rs/probe-run#277 Co-authored-by: Jonas Schievink <jonasschievink@gmail.com>

jonas-schievink added type: bug Something isn't working status: needs PR Issue just needs a Pull Request implementing the changes topic: unwinding labels Nov 15, 2021

jonas-schievink mentioned this issue Nov 22, 2021

Fix unwinding through Reset rust-embedded/cortex-m-rt#337

Merged

jonathanpallant closed this as completed Nov 25, 2021

newAM added a commit to stm32-rs/stm32wlxx-hal that referenced this issue Nov 25, 2021

cortex-m-rt: 0.7.0 -> 0.7.1

452fee8

This resolves the corrupted stack backtrace in the testsuite. See knurling-rs/probe-run#277

newAM mentioned this issue Nov 25, 2021

cortex-m-rt: 0.7.0 -> 0.7.1 stm32-rs/stm32wlxx-hal#250

Merged

newAM added a commit to stm32-rs/stm32wlxx-hal that referenced this issue Nov 25, 2021

cortex-m-rt: 0.7.0 -> 0.7.1

ef6fec6

This resolves the corrupted stack backtrace in the testsuite. See knurling-rs/probe-run#277

jonas-schievink mentioned this issue Nov 25, 2021

unwind: skip FDEs with initial address of 0 #287

Merged

adamgreig mentioned this issue Feb 11, 2023

cortex-m-rt: ensure the stack is 8-byte aligned. rust-embedded/cortex-m#463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False positive "call stack was corrupted" warning on 0.3 #277

False positive "call stack was corrupted" warning on 0.3 #277

jonas-schievink commented Nov 15, 2021

newAM commented Nov 15, 2021

jamesmunns commented Nov 22, 2021

jonathanpallant commented Nov 24, 2021

jonathanpallant commented Nov 25, 2021

False positive "call stack was corrupted" warning on 0.3 #277

False positive "call stack was corrupted" warning on 0.3 #277

Comments

jonas-schievink commented Nov 15, 2021

newAM commented Nov 15, 2021

jamesmunns commented Nov 22, 2021

jonathanpallant commented Nov 24, 2021

jonathanpallant commented Nov 25, 2021