Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

Consider removing the need for a "sentinel stack frame" while unwinding #382

Closed
jamesmunns opened this issue Feb 13, 2023 · 5 comments · Fixed by #383
Closed

Consider removing the need for a "sentinel stack frame" while unwinding #382

jamesmunns opened this issue Feb 13, 2023 · 5 comments · Fixed by #383

Comments

@jamesmunns
Copy link

jamesmunns commented Feb 13, 2023

At the moment, it is necessary for probe-run to "run into" a sentinel frame where the link register is recorded as 0xFFFF_FFFF while unwinding (on panic).

rust-embedded/cortex-m-rt#337 was opened to include this in c-m-rt, but it turns out, this caused a pretty serious regression, that we just recently caught.

We're fixing this for cortex-m-rt v0.7.3 (in rust-embedded/cortex-m#463) by pushing another word to the stack, but this means that all targets have 8 bytes of "dead" stack usage, as well as some startup code (and confusing asm) necessary to insert the necessary "full stack frame".

During testing, we noticed that GDB was actually totally happy to not have the sentinel frame at all - only probe-run was upset (it throws a warning that the stack is corrupted).

We're currently looking at whether it would be possible to totally remove the sentinel frame in c-m-rt. Ideally, this would involve probe-run being able to handle the case where it hits the end of the stack without hitting the sentinel frame.

No hard decision has been made to remove the termination frame (and cfi/cfa hints) yet, but there seems to be some desire to. We decided to keep it (and add a second push) for now, to allow us to ship 0.7.3 ASAP, and not break (or cause problems for) probe-run.


(suggested impl ahead, feel free to ignore): I think it should be possible to notice when hitting the stack top symbol, and if it was hit "gracefully", e.g. there is SOME frame right at the top of the stack with no excess data, it can be treated as the terminal frame, and not throw a "stack corrupted" warning.

I'm happy to provide a patched version of cortex-m-rt that would reproduce the currently discussed behavior for testing.

@jamesmunns jamesmunns changed the title Consider removing the need for a "sentinel stack frame" Consider removing the need for a "sentinel stack frame" while unwinding Feb 13, 2023
@adamgreig
Copy link
Sponsor

I have to apologise for making a liar of @jamesmunns here: in the end I decided to go directly to the startup code we want to end up with, which does not push a stack frame before branching to the main function. This will be in cortex-m-rt 0.7.3 which we'll encourage users to update to quickly to avoid the miscompilation issue (advisory); at that point they'll get a warning from probe-run when it attempts to unwind past Reset, like this:

stack backtrace:
   0: lib::inline::__delay
        at ./asm/inline.rs:62:5
   1: __delay
        at ./asm/lib.rs:51:17
   2: f4::__cortex_m_rt_main
        at src/main.rs:14:9
   3: main
        at src/main.rs:7:1
   4: Reset
(HOST) WARN  call stack was corrupted; unwinding could not be completed

It seems like it might be an option for probe-run to swap to using probe-rs's unwinding support instead of implementing its own unwinder; the unwinder in probe-rs didn't exist when probe-run was started but now seems quite usable, though I've not investigated how easy swapping to it would be. It might also be that the warning can just be turned off as it seems like it's easy to hit in a number of situations besides this one (such as if the backtrace is from a leaf function where LR is used as a general purpose register). In any event it doesn't seem like this change has any impact beyond generating the warning.

@Urhengulas
Copy link
Member

Thank you for notifying us 💚


The check if we run into the sentinel frame happens here:

if lr == registers::LR_END {
break;
}

If I understand correctly, we need to replace this with a new "we hit the stack top symbol"-check.

@jamesmunns
Copy link
Author

That's my understanding! At the bottom of that function you advance the stack pointer. After advancing, if sp == _stack_start, you can also gracefully exit. If sp > _stack_start, you probably want to bail with a corrupted stack error.

I'm not sure how you would get the stack_start symbol - but it will definitely exist in the debuginfo, as it is a symbol PROVIDEd by the linker (here: https://github.com/rust-embedded/cortex-m/blob/8e4b18741d77a57eb0ae4d9a011611d1428a478d/cortex-m-rt/link.x.in#L63).

@Urhengulas
Copy link
Member

In rust-embedded/cortex-m#463 (comment) @adamgreig said:

gdb is not affected by default (it detects the main function and stops unwinding past that [...])

Maybe we can do sth similar as well? We already have the start of the main fn, we would just need the size and then would know if the last address of unwinding is in the main fn.

probe-run/src/elf.rs

Lines 49 to 51 in 6f3a3a4

pub fn main_fn_address(&self) -> u32 {
self.symbols.main_fn_address
}

@Urhengulas
Copy link
Member

@jamesmunns said:

That's my understanding! At the bottom of that function you advance the stack pointer. After advancing, if sp == _stack_start, you can also gracefully exit. If sp > _stack_start, you probably want to bail with a corrupted stack error.

I'm not sure how you would get the stack_start symbol - but it will definitely exist in the debuginfo, as it is a symbol PROVIDEd by the linker (here: https://github.com/rust-embedded/cortex-m/blob/8e4b18741d77a57eb0ae4d9a011611d1428a478d/cortex-m-rt/link.x.in#L63).

The _stack_start could be easily extracted while extracting the other symbols from the elf. In the example I just tried it looked like this:

[src/elf.rs:193] symbol = Symbol {
    name: "_stack_start",
    address: 537131968,
    size: 0,
    kind: Label,
    section: Absolute,
    scope: Dynamic,
    weak: false,
    flags: Elf {
        st_info: 16,
        st_other: 0,
    },
}

The symbols are extracted here:

probe-run/src/elf.rs

Lines 173 to 194 in 6f3a3a4

fn extract_symbols(elf: &ObjectFile) -> anyhow::Result<Symbols> {
let mut rtt_buffer_address = None;
let mut program_uses_heap = false;
let mut main_fn_address = None;
for symbol in elf.symbols() {
let name = match symbol.name() {
Ok(name) => name,
Err(_) => continue,
};
let address = symbol.address().try_into().expect("expected 32-bit ELF");
match name {
"main" => main_fn_address = Some(cortexm::clear_thumb_bit(address)),
"_SEGGER_RTT" => rtt_buffer_address = Some(address),
"__rust_alloc" | "__rg_alloc" | "__rdl_alloc" | "malloc" if !program_uses_heap => {
log::debug!("symbol `{}` indicates heap is in use", name);
program_uses_heap = true;
}
_ => {}
}
}

@bors bors bot closed this as completed in 8063d9d Mar 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants