-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFE_ES_ProcessCoreException() is not interrupt-safe #76
Comments
Imported from trac issue 45. Created by jphickey on 2015-05-07T11:48:29, last modified: 2019-04-24T13:18:47 |
Trac comment by jphickey on 2015-06-15 17:44:06: Another issue with CFE_ES_ProcessCoreException(): The API is somewhat flawed in that the first parameter is a uint32 for which the PSP uses the "raw" (unabstracted) underlying OS task identifier. Depending on the OS, the actual OS task identifier may not be an integer and may not fit into 32 bits. (Cygwin is one example where the {{{pthread_t}}} type is actually a structure and is not a simple integer). It would be better to move the logic to figure out which OSAL task is associated with that host task ID //before// calling {{{CFE_ES_ProcessCoreException()}}}. This way, the first parameter could be an OSAL task ID which would make the code more straightforward as well (removing the loop to call {{{OS_TaskGetInfo()}}} etc). |
Trac comment by jphickey on 2018-05-14 11:19:12: This should be considered for the next CFE release. It will likely need a coordinated change between PSP and CFE. |
Trac comment by sseeger on 2018-06-23 11:07:50: This seems to be used in vxWorks implementations. There is a CFE_PSP_ExceptionHook for rtems, but I don't see where it's registered. Possibly built-in to RTEMS source (in-house modification) or something? After analyzing the vxWorks use case, and what I know of vxWorks, the answer to this problem seems to be to have the handler wake up a task created by PSP to handle the exception for us. vxWorks guarantees the task that caused the exception will be suspended, so this helper task can handle it. If we go this route, then we should be able to leave CFE_ES_ProcessCoreException() unmodified. Let's discuss this at the next CCB to see if there is consensus with this idea, or if there are other suggestions. |
Trac comment by jhageman on 2019-03-04 09:20:25: CCB - Needs review for path forward. |
Trac comment by jhageman on 2019-04-10 13:31:36: CCB 4/10/19
|
Trac comment by jphickey on 2019-04-10 14:25:19: Brainstorming about this one a bit, on most systems the "exceptions" that occur lie in one of three categories:
Case (1): Arithmetic errors are almost always just bad input and (or bad code that didn't check its input) and generally have low likelihood of affecting the other tasks running in the system. For this, restarting only the specific task that caused the problem might be a suitable response, and there could be a reasonably high level of confidence that the remainder of the system will continue to work properly. Case (2): A bus error means that the memory itself is bad, or the external memory-mapped device on the bus is faulted, or that the CPU is trying to access a place outside its valid physical memory map. All of which suggest that the system is seriously ill and probably nothing less than a full system restart (including external devices) will restore it to normal operation. The only possibility here is that if the PSP can positively identify the cause of the bus error and fix the actual root cause. But without this, the external bus could be left in an indeterminate state and all future accesses might fail too. Case (3): This only applies to systems with an MMU, and means the software accessed something outside its virtual address space. CFE ES/OSAL does //not// currently run apps in separate memory spaces, all tasks share the same virtual address space. If you are lucky enough to catch this on the first bad access (like a NULL pointer dereference) then task-specific recovery might be possible. But more likely, this happens with e.g. a My opinion: |
Trac comment by jhageman on 2019-04-24 13:15:26: Also related to discussion on #310, mission specific handling should go in PSP. |
Trac comment by jhageman on 2019-04-24 13:18:47: Latest actions for this ticket:
|
Whatever path is chosen, there is no requirement since it's not an application API. Recommend moving out of cfe_es_api.c. |
If this is done along with #411, then can the new ES background task log what is necessary and do either the App Restart or PSP based reset? |
I should likely write a functional requirement (that it logs the information and takes the platform defined response) or whatever it ends up doing so it gets build verified, just not a "upon request" type of requirement. |
The CFE_ES_ProcessCoreException() architecture needs to be re-examined.
This is the equivalent of an interrupt handler and likely triggered as the result of a hardware interrupt. However the implementation calls other CFE_ES functions, some of which take the global data mutex (CFE_ES_LockSharedData), do console output (CFE_ES_WriteToSyslog and other functions which ultimately do printf), or call OS_TaskDelay.
All of these operations are unsafe to do in an interrupt context on most platforms.
This may not be an issue if the end-result of the exception is a processor reset, which is probably the only safe thing to do. Although the option to restart the task does exist, the system may be too far gone after this.
The text was updated successfully, but these errors were encountered: