Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminating recursive locks in SB could allow for using more efficient resource #948

Closed
skliper opened this issue Oct 14, 2020 · 8 comments · Fixed by #1092
Closed

Eliminating recursive locks in SB could allow for using more efficient resource #948

skliper opened this issue Oct 14, 2020 · 8 comments · Fixed by #1092
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Oct 14, 2020

Describe the request
Recursive locks possible in the following code (may also be in other locations):

CFE_EVS_SendEventWithAppID(CFE_SB_SUBSCRIPTION_REMOVED_EID,CFE_EVS_EventType_DEBUG,CFE_SB.AppId,
"Subscription Removed:Msg 0x%x on pipe %d,app %s",
(unsigned int)CFE_SB_MsgIdToValue(MsgId),
(int)PipeId,CFE_SB_GetAppTskName(TskId,FullName));

CFE_EVS_SendEventWithAppID(CFE_SB_UNSUB_NO_SUBS_EID,CFE_EVS_EventType_INFORMATION,CFE_SB.AppId,
"Unsubscribe Err:No subs for Msg 0x%x on %s,app %s",
(unsigned int)CFE_SB_MsgIdToValue(MsgId),
PipeName,CFE_SB_GetAppTskName(TskId,FullName));

CFE_EVS_SendEvent(CFE_SB_FULL_SUB_PKT_EID,CFE_EVS_EventType_DEBUG,
"Full Sub Pkt %d Sent,Entries=%d,Stat=0x%x\n",(int)SegNum,(int)EntryNum,(unsigned int)Stat);

CFE_EVS_SendEvent(CFE_SB_PART_SUB_PKT_EID,CFE_EVS_EventType_DEBUG,
"Partial Sub Pkt %d Sent,Entries=%d,Stat=0x%x",(int)SegNum,(int)EntryNum,(unsigned int)Stat);

Related - the locking in the SendPrevSubs command handling doesn't look like it really helps since it has to unlock to send the message (same issues as the commands to record route/map info to file), typical use case is to enable subscription reporting, then send all previous subscriptions so may make sense to refactor (and possibly throttle).

To Reproduce
Clear filters on the debug messages and trigger (I stopped SAMPLE_APP to cause the pipe deletion), or just subscribe and unsubscribe twice to trigger CFE_SB_UNSUB_NO_SUBS_EID.

Expected behavior
Avoiding recursive lock could allow for using a more efficient resource on platforms where it's supported.

Code snips
See above.

System observed on:
From code analysis, tested on Ubuntu 18.04.

Additional context
From analysis during #928 and #947

Reporter Info
Jacob Hageman - NASA/GSFC

@skliper skliper added the bug label Oct 14, 2020
@skliper skliper added this to the 7.0.0 milestone Oct 14, 2020
@skliper
Copy link
Contributor Author

skliper commented Oct 14, 2020

Note any fix will conflict with #947, so hold off for now.

@skliper
Copy link
Contributor Author

skliper commented Oct 14, 2020

Tested on linux - enabled all event types, then stopped SAMPLE_APP which then reported

1980-012-14:16:00.50098 CFE_ES_DeleteApp: Delete Application SAMPLE_APP Initiated
EVS Port1 66/1/CFE_ES 7: Stop Application SAMPLE_APP Initiated.
1980-012-14:16:03.00037 Application SAMPLE_APP called CFE_ES_ExitApp
EVS Port1 66/1/CFE_SB 48: Subscription Removed:Msg 0x1883 on pipe 5,app CFE_ES.ES_BackgroundTask
EVS Port1 66/1/CFE_SB 48: Subscription Removed:Msg 0x1882 on pipe 5,app CFE_ES.ES_BackgroundTask
EVS Port1 66/1/CFE_SB 47: Pipe Deleted:id 5,owner SAMPLE_APP
EVS Port1 66/1/CFE_ES 8: Stop Application SAMPLE_APP Completed.

Sent a few noops and they all worked. So double lock didn't seem to hang up on this system.

@jphickey
Copy link
Contributor

I was mistaken, I had thought we were using fast mutexes, but we are in fact using recursive mutexes on POSIX:

https://github.com/nasa/osal/blob/3e087e2727e29791f81a7628346426eb7e3bc44a/src/os/posix/src/os-impl-mutex.c#L95-L98

So this is probably why it works ....?

Using RECURSIVE is sort of like a cheat/easy way out - using a slower resource all the time because one doesn't want to expend the effort to avoid double lock. Ideally we should fix the code so it works with normal (non-recursive) mutexes.

@skliper
Copy link
Contributor Author

skliper commented Oct 15, 2020

At least from the documentation I've come across, all three OSAL implementations in the framework support the nested locks. I don't see it mentioned in the API that the implementation MUST support nested locks though. There is a test to check it:

https://github.com/nasa/osal/blob/3e087e2727e29791f81a7628346426eb7e3bc44a/src/tests/mutex-test/mutex-test.c#L246-L259

Given that, I wouldn't think we'd want to change the behavior of this API. As you mention, could eliminate all the recursive locks and add a new API to improve performance... but at this point not a priority.

@skliper skliper added enhancement and removed bug labels Oct 15, 2020
@skliper skliper changed the title Possible double locks in SB Eliminating recursive locks in SB could allow for using more efficient resource Oct 15, 2020
@jphickey
Copy link
Contributor

I agree WRT not changing the OSAL behavior at this time, but we definitely should work toward removing the requirement/dependency on recursive mutexes in CFE. It is just bad design to have tasks locking the same resources more than once.

I wrote issue nasa/osal#623 for some things we can easily do in OSAL to facilitate. Recommendation is to start with an debug message in the event that a task is double locking (can be done at shared layer).

@skliper
Copy link
Contributor Author

skliper commented Jan 19, 2021

@jphickey Did #1092 fix these?

@jphickey
Copy link
Contributor

@jphickey Did #1092 fix these?

Yes, it does - at least for all the cases listed in the description. The general pattern implemented now will defer sending events (including debug events) until the function is finishing - after unlocking - so there is no nested/double lock. However - I did not explicitly test for nested locking as part of #1092.

@skliper
Copy link
Contributor Author

skliper commented Mar 30, 2021

Closed by #1073

@skliper skliper closed this as completed Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants