-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: Treat stack overflows as an unrecoverable error #18448
Conversation
Presently, RIOT just emits a warning when a stack overflow is encountered but still resumes execution. In my view, execution should be aborted as the detection of a stack overflows via the heuristic provided by the scheduler is an unrecoverable error. I ran into this while performing automated tests of a RIOT application where a stack overflow occurred but I only noticed this after inspecting the application output more closely. Similar to SSP failures, I added crash_code for stack overflows.
0dade34
to
95b6f85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Note that the heuristic may detect a stack overflow a tad too early (e.g. it is not possible to tell apart if the stack was fully exhausted up to the last byte but not overflown, or it did overflow indeed). But IMO wasting up to 4 bytes of stack space is very much worth it.
Also note that with the changed behavior we should get rid of THREAD_CREATE_STACKTEST
and instead always create the stack overflow detection pattern in the stack. (That is, define THREAD_CREATE_STACKTEST
to 0
for compatibility and deprecate that #define
, but drop all in-tree users.) Otherwise a thread created without a stack test would result in an immediate core panic.
Do you have time to do this? Otherwise I could create a PR to remove THREAD_CREATE_STACKTEST
and deprecated the backward compatibility define. This could then rebased on top.
It would IMO also be nice to use the new PANIC_STACK_OVERFLOW
for the MPU stack protector. I think when mpu_noexec_ram
is not used, one could just use that in the mem manage handler without additional effort to narrow the cause down, right?
Isn't See also Lines 237 to 247 in 25a5269
and Lines 174 to 189 in 25a5269
|
Yes. The point is, with this change RIOT will no longer work at all without |
I think enabling the stack overflow detection heuristic by default is a good idea. However, maybe it makes sense to retain
I don't have the time to implement this at the moment, so feel free to move forward with this :)
Yes, this is a good idea! Not sure how to best implement this though, currently the RIOT/cpu/cortexm_common/vectors_cortexm.c Lines 469 to 472 in 8cf20a2
I am not much of an ARM person myself so I am not sure how one would determine the cause for the invocation of the |
|
95b6f85
to
8011665
Compare
I just rebased this PR and resolved the merge conflict. Apart from enabling the stack overflow check by default (which is likely a separate issue) do we at least agree that it makes sense to trigger a panic on stack overflow? And if so: What changes are necessary in order to get this merged then? |
OK, it looks that my fear that without Lines 237 to 252 in 3876f38
(In the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
Contribution description
Presently, RIOT just emits a warning when a stack overflow is encountered but still resumes execution. In my view, execution should be aborted as the detection of a stack overflows via the heuristic provided by the scheduler is an unrecoverable error.
I ran into this while performing automated tests of a RIOT application where a stack overflow occurred but I only noticed this after inspecting the application output more closely.
Similar to SSP failures, I added crash_code for stack overflows and also bumped the log level from warning to error.
Testing procedure
I don't think there is a test case for the
SCHED_TEST_STACK
heuristic but basically the testing procedure would boil down to ensuring that execution does not continue after a stack overflow is detected by the scheduler.Issues/PRs references
None.