Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #411, rework exception handling in CFE #653

Merged
merged 2 commits into from
May 8, 2020

Conversation

jphickey
Copy link
Contributor

Describe the contribution
Move exception handling to a PSP function. In this approach the CFE only logs data after the event as a background job.

Replaces the CFE_ES_ProcessCoreException with a simple notification that causes the ES background job to run, which in turn polls the PSP for logged exceptions and writes entries to the ES ER log.

Both the PSP execption scan and the ER log file dump are converted to background jobs.

Fixes #411
Fixes #76

Testing performed
Validate CFE exception handling behavior on POSIX and VxWorks (not implemented on RTEMS).
Add divide by zero "bug" to sample_app for testing.
Confirm exception is logged correctly
Confirm ER dump to file works

Expected behavior changes
No change from CMD/TLM perspective
API CFE_ES_ProcessCoreException is removed, replaced with async event.

System(s) tested on
Ubuntu 20.04 LTS 64-bit
VxWorks 6.9 on MCP750

Additional context
Needs corresponding changes in PSP

Contributor Info - All information REQUIRED for consideration of pull request
Joseph Hickey, Vantage Systems, Inc.

Move exception handling to a PSP function.  In this approach
the CFE only logs data after the event as a background job.

Replaces the CFE_ES_ProcessCoreException with a simple notification
that causes the ES background job to run, which in turn polls the
PSP for logged exceptions and writes entries to the ES ER log.

Both the PSP execption scan and the ER log file dump are converted
to background jobs.
@jphickey jphickey force-pushed the fix-411-exceptions branch 2 times, most recently from d4f62ed to 4eb937b Compare May 6, 2020 11:48
Correct items flagged as warnings in documenation build, and
remove now-unused definition in sample platform config
@jphickey
Copy link
Contributor Author

jphickey commented May 6, 2020

Should be ready for merge now.

@jphickey jphickey marked this pull request as ready for review May 6, 2020 11:54
@jphickey
Copy link
Contributor Author

jphickey commented May 6, 2020

This was reviewed in a special topic on 2020-04-30, but may want to mention this in a CCB again.

@jphickey jphickey added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label May 6, 2020
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
My only concern is making sure we have a way to preserve exception context information and get it to the ground. Can probably address this in the PSP.

@astrogeco
Copy link
Contributor

CCB-20200506 - APPROVED. Also covered in special topic on 4/30

@astrogeco astrogeco added CCB-20200506 and removed CCB:Ready Ready for discussion at the Configuration Control Board (CCB) labels May 6, 2020
@skliper skliper added this to the 6.8.0 milestone May 7, 2020
@dmills3
Copy link

dmills3 commented May 7, 2020

Sorry for the delay in posting my comments.

  1. I have a question about how CFE handles exception. Should the system reset if a task not managed by CFE receives an exception?
    The function CFE_ES_RunExceptionScan will call CFS_PSP_Restart in cfe_es_erlog.c and restart the system when running on VxWorks.
  2. The TblHandle is used before validating on line 681 in the file cfe_tbl_api.c. It is validated on line 696 in the function CFE_TBL_Load.

@jphickey
Copy link
Contributor Author

jphickey commented May 7, 2020

  1. I have a question about how CFE handles exception. Should the system reset if a task not managed by CFE receives an exception?
    The function CFE_ES_RunExceptionScan will call CFS_PSP_Restart in cfe_es_erlog.c and restart the system when running on VxWorks.

In the current framework, yes, I think this is reasonable. The "mcp750-vxworks" PSP is primarily a test/reference platform and it is likely/expected that CFE is the only service running within the system. It is also consistent with the way MCP750 has worked before.

As the exception hook is implemented in the board-specific code this could do something else on other platforms, such as ignoring the exception if it was generated from a task outside CFE. But even still, assuming whatever actually did generate the exception was a necessary service, it might be warranted to restart the processor if it fails anyway.

  1. The TblHandle is used before validating on line 681 in the file cfe_tbl_api.c. It is validated on line 696 in the function CFE_TBL_Load.

This PR does not change anything in cfe_tbl_api.c ... what you might have seen earlier was likely due to a baseline mismatch. This is corrected since the rebase - so if you look at the diff now it is easier to read.

@skliper
Copy link
Contributor

skliper commented May 7, 2020

  1. The TblHandle is used before validating on line 681 in the file cfe_tbl_api.c. It is validated on line 696 in the function CFE_TBL_Load.

I wrote this up as a separate issue, see #687

@dmills3
Copy link

dmills3 commented May 7, 2020

The exception hook does not manage the CFE task. Maybe CFE could call a board specific function to do something else after CFE has determined it not a CFE task.

@dmills3
Copy link

dmills3 commented May 7, 2020

Another thought the CFE exception handler could return and let the board specific code decide to restart or not.

@jphickey
Copy link
Contributor Author

jphickey commented May 7, 2020

If the exception is not relevant to CFE then the hook can just drop it and not log it, or it can do something else entirely. Any exceptions which ARE written to the internal buffer are assumed to be related to CFE and will trigger a restart if no specific action is configured.

This is entirely up to the PSP as to how to disposition. The current implementation just assumes that everything is related to CFE because that is expected to be the only thing running.

@astrogeco astrogeco changed the base branch from master to integration-candidate May 8, 2020 18:16
@astrogeco astrogeco added IC-20200429 CCB:Approved Indicates code review and approval by community CCB labels May 8, 2020
@astrogeco astrogeco merged commit 733eb94 into nasa:integration-candidate May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CCB:Approved Indicates code review and approval by community CCB
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception and Reset Log possible race conditions CFE_ES_ProcessCoreException() is not interrupt-safe
4 participants