Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/cortexm_common: additional information on hardfault #3333

Merged
merged 1 commit into from
Aug 19, 2015

Conversation

daniel-k
Copy link
Member

@daniel-k daniel-k commented Jul 7, 2015

I was so annoyed by debugging hardfaults on my platform (cortex m0+) that I added some helper for much nicer debugging. My problem always was that when the cpu goes to hardfault, I couldn't figure out where the error happened because the backtace in GDB didn't work.

This PR outputs additional information before calling core_panic(...). The most useful is the value of the program counter when the exception was raised. I don't know if you want to merge this, but I wanted to share at least.

When debugging with GDB use like this:

# pc: 0x35e2
breakpoint *0x35e2
monitor reset halt
continue

Now GDB resets the cpu and stops at the instruction that caused the hardfault in the first place.

Concerning the broken backtrace in case someone is interested (and for documentary purposes for myself):

The Cortex M* architecture has 2 running modes, Thread mode and Handler mode. Both have their own stack (registers psp = Process sp and msp = Main sp). Depending on the mode the cpu is in, the given stackpointer is aliased as sp.
RIOT runs the reset handler and every interrupt in Handler mode (maybe also parts of the scheduling, not sure), whereas each thread runs in Thread mode. That's why backtrace from inside a hardfault doesn't work.

Additionally, given that ISRs run in Handler mode, this is why doing crazy stuff inside a ISR leads to problems, because the main stack is not very large.

@OlegHahm OlegHahm added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Jul 7, 2015
(
"movs r0, #4 \n" /* r0 = 0x4 */
"mov r1, lr \n" /* r1 = lr */
"tst r1, r0 \n" /* if(lr & 0x4) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tst lr, #4 should be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for Cortex M0(+), because tst requires 2 Lo-Registers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I guess the ite eq instruction is missing as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Am 8. Juli 2015 11:10:40 MESZ, schrieb Joakim Gebart notifications@github.com:

  • volatile unsigned int r2;
  • volatile unsigned int r3;
  • volatile unsigned int r12;
  • volatile unsigned int lr; /* Link register. */
  • volatile unsigned int pc; /* Program counter. */
  • volatile unsigned int psr;/* Program status register. */
  • /* Stackpointer of exception stackframe */
  • uint32_t* sp;
  • /* Get stackpointer where exception stackframe lies and store in
    sp */
  • __ASM volatile
  • (
  •    "movs r0, #4                         \n" /\* r0 = 0x4  
    
    */
  •    "mov r1, lr                          \n" /\* r1 = lr  
    
    */
  •    "tst r1, r0                          \n" /\* if(lr & 0x4)  
    
    */

I see, I guess the ite eq instruction is missing as well?


Reply to this email directly or view it on GitHub:
https://github.com/RIOT-OS/RIOT/pull/3333/files#r34128870

@jnohlgard
Copy link
Member

regarding the trampoline function, see https://github.com/eistec/contiki/blob/mulle-master/cpu/arm/k60/interrupt-vector-k60.c#L295 and https://github.com/eistec/contiki/blob/mulle-master/cpu/arm/k60/fault-handlers.c#L28 for a small example.

The hardfault handler example in the links above is taken slightly modified from the excellent book The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors

@jnohlgard
Copy link
Member

To get this even more robust it might be wise to check the value of the stack pointer before calling the C handler, and if it is not valid (e.g. outside of RAM), set it to the reset value (&_estack) and print another message that the stack pointer was corrupt.

@daniel-k
Copy link
Member Author

daniel-k commented Jul 8, 2015

To get this even more robust it might be wise to check the value of the stack pointer before calling the C handler, and if it is not valid (e.g. outside of RAM), set it to the reset value (&_estack) and print another message that the stack pointer was corrupt.

I like that!

@daniel-k
Copy link
Member Author

daniel-k commented Jul 8, 2015

I needed to introduce new variables in the common linker script to get the RAM bounds. There was no other way to get this information, wasn't it?
Additionally I just made hard_fault_default the naked trampoline and introduced hard_fault_handler. Any objections?

@jnohlgard
Copy link
Member

@daniel-k I think it is probably necessary to modify the linker scripts unless you are going to rely on assumptions about the ordering of sections etc. Will you update this PR with those changes?

@OlegHahm OlegHahm assigned jnohlgard and unassigned kaspar030 Jul 13, 2015
@daniel-k
Copy link
Member Author

@gebart

Will you update this PR with those changes?

What do you mean? I already modified the linker script etc. see daniel-k@3563455

@jnohlgard
Copy link
Member

Sorry, I accidentally commented inside the commit instead of in the change list.

@jnohlgard
Copy link
Member

This is where the unification of our Cortex-M platform code pays off!

@daniel-k
Copy link
Member Author

This is where the unification of our Cortex-M platform code pays off!

Absolutely! :-)

@jnohlgard
Copy link
Member

@daniel-k
Could you add the following: if msp is invalid, update msp in the asm trampoline to point at _estack or something, so that the C hardfault handler does not cause a lockup when calling puts.

@jnohlgard
Copy link
Member

psp shouldn't matter since in ISR context we will be using msp for stacking

@daniel-k
Copy link
Member Author

@gebart
There you go. Actually it already failed when calling the hard_fault_handler() C-function because this already involves stack pushing.

@daniel-k
Copy link
Member Author

Regarding the clobber registers: now that there are inputs to the asm block, they get loaded to r2-r4 by the compiler before executing the inline assembly. I guess when not specifying them, the compiler might load the variables to r0-r2. But didn't test that.

@jnohlgard
Copy link
Member

I like this solution.

@jnohlgard
Copy link
Member

ACK

@jnohlgard
Copy link
Member

please squash

@OlegHahm OlegHahm added the CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable label Jul 15, 2015
@jnohlgard
Copy link
Member

I opened daniel-k#1 for kinetis_common support. Tested on k60.

puts("\nContext before hardfault:");

/* TODO: printf in ISR context might be a bad idea */
printf("r0: 0x%x\nr2: 0x%x\nr3: 0x%x\nr3: 0x%x\n", r0, r1, r2, r3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r3 twice and r1 missing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split the string into multiple lines like:

printf("r0:  0x%x\n"
    "r1:  0x%x\n"
    "r2:  0x%x\n"
    "r3:  0x%x\n",
    r0, r1, r2, r3);

@jnohlgard
Copy link
Member

it would be useful to have all other registers saved to the stack before printing so that we can get a complete image of the register state
something like push r4..r11 inside the asm handler and then printing them from the C handler.

@jnohlgard
Copy link
Member

... but let's make the improvements as follow-up PRs. ACK as soon as the printf format is fixed and the kinetis_common ldscript is updated.

@daniel-k
Copy link
Member Author

@gebart
There you go again. Please ACK, then I'll squash and rebase.

@jnohlgard
Copy link
Member

ACK
rebase, squash. Merge when Travis is green (ignore coap failures if sf.net is still down)

@haukepetersen
Copy link
Contributor

nack. I do like the idea of this PR, but I am strongly opposing to compile this in per default. I would actually opt for putting this code in #if ENABLE_DEBUG ... #endif conditions and only compile it in if explicitly activated. As an alternative I could also live with making this dependable on DEVELHELP...

@daniel-k
Copy link
Member Author

daniel-k commented Aug 4, 2015

@haukepetersen
Is the concern about code size? I'd agree with you that this doesn't need to be present in production code. But I don't really like putting it inside ENABLE_DEBUG. This won't spam any output during normal operation, so there's no point in suppressing it. When someone (new) is developing with RIOT and encounters hardfaults, he might not even get the idea to enable debug somewhere deep inside the common cpu code.

@jnohlgard
Copy link
Member

@haukepetersen I can concede to enable this on DEVELHELP enabled builds only. I would not like to have this disabled by default inside #if ENABLE_DEBUG

@jnohlgard
Copy link
Member

I have a follow up to this PR in the pipe on my side which will introduce some additional help to further investigate the source of the fault when running with a debugger attached.

@kaspar030
Copy link
Contributor

I guess we really need to rediscuss DEVELHELP at the next dev meeting. I'd also not want to have this compiled in by default...

@haukepetersen
Copy link
Contributor

DEVELHELP is fine for me (and IMHO this is exactly the stuff that I would expect to be activated when DEVELHELP is active)

@jnohlgard
Copy link
Member

ACK, please squash.

@jnohlgard
Copy link
Member

wait with the merge until after the dev meeting

@daniel-k daniel-k force-pushed the cortexm_hardfault_information branch from caeb7aa to 32aac8d Compare August 7, 2015 15:01
@daniel-k
Copy link
Member Author

daniel-k commented Aug 7, 2015

Squashed

@miri64
Copy link
Member

miri64 commented Aug 19, 2015

So, what is the status on this? Can it be merged?

@daniel-k
Copy link
Member Author

I don't know about the outcomes of the dev meeting. As far as there are no objections about wrapping it with DEVELHELP I guess this can be merged now. I'm busy until next week, but if you want to merge it already, go for it :)

@miri64
Copy link
Member

miri64 commented Aug 19, 2015

I wasn't in the dev meeting either, but I was proposed @kushalsingh007 to use this PR for debugging hard faults on arduino-due. When I looked it up, I saw that this PR was ACK'd but postponed, so I was just wondering. @gebart do you know anything?

@jnohlgard
Copy link
Member

http://riot.pad.spline.de/17?

General definition for the usage of DEVELHELP:
non-invasive, additional information during general development and testing

I missed the greater part of the meeting because of scheduling conflicts, but from what I gathered at the end of the session was that this would be OK inside DEVELHELP. @kaspar030, @OlegHahm or @haukepetersen may have more information.

@jnohlgard
Copy link
Member

@daniel-k you should add your name to the top of the file with @author blablah

@daniel-k daniel-k force-pushed the cortexm_hardfault_information branch from 32aac8d to dabaf82 Compare August 19, 2015 12:24
@daniel-k daniel-k force-pushed the cortexm_hardfault_information branch from dabaf82 to 7a86344 Compare August 19, 2015 12:25
@daniel-k
Copy link
Member Author

@gebart Done. And also rebased. Should we merge directly? Where's Travis and Strider gone?

@OlegHahm
Copy link
Member

Yes, @gebart is right. Inside DEVELHELP it's fine. Go ahead!

@daniel-k
Copy link
Member Author

What about the failed xtimer tests with strider? Known issue and safe to merge?

miri64 added a commit that referenced this pull request Aug 19, 2015
cpu/cortexm_common: additional information on hardfault
@miri64 miri64 merged commit 3d06531 into RIOT-OS:master Aug 19, 2015
@miri64
Copy link
Member

miri64 commented Aug 19, 2015

Strider is just there to test Strider, as long as it isn't working properly we can safely ignore it :-).

@OlegHahm
Copy link
Member

And xtimer test is not even merged, so you were probably looking at the wrong build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants