-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS_TaskDelete is not synchronous on POSIX #642
Comments
|
This was referenced Dec 7, 2020
jphickey
added a commit
to jphickey/osal
that referenced
this issue
Dec 9, 2020
In the POSIX implementation, OS_TaskDelete was implemented in a deferred manner - the API call was a request, and the actual deletion occured sometime thereafter. This is a problem if the task is running code within a dynamically loaded module, and the intent is to delete the task so the module can be unloaded. In this case the app needs to be certain that the task has actually been deleted before unloading can be done safely. To do this requires use of pthread_join() on POSIX which confirms that the task has exited. However, this is a (potentially) blocking call, so to do this requires rework of the EXCLUSIVE lock mode such that the OSAL lock is _not_ held during the join operation.
jphickey
added a commit
to jphickey/osal
that referenced
this issue
Dec 10, 2020
In the POSIX implementation, OS_TaskDelete was implemented in a deferred manner - the API call was a request, and the actual deletion occured sometime thereafter. This is a problem if the task is running code within a dynamically loaded module, and the intent is to delete the task so the module can be unloaded. In this case the app needs to be certain that the task has actually been deleted before unloading can be done safely. To do this requires use of pthread_join() on POSIX which confirms that the task has exited. However, this is a (potentially) blocking call, so to do this requires rework of the EXCLUSIVE lock mode such that the OSAL lock is _not_ held during the join operation.
jphickey
added a commit
to jphickey/osal
that referenced
this issue
Dec 11, 2020
In the POSIX implementation, OS_TaskDelete was implemented in a deferred manner - the API call was a request, and the actual deletion occured sometime thereafter. This is a problem if the task is running code within a dynamically loaded module, and the intent is to delete the task so the module can be unloaded. In this case the app needs to be certain that the task has actually been deleted before unloading can be done safely. To do this requires use of pthread_join() on POSIX which confirms that the task has exited. However, this is a (potentially) blocking call, so to do this requires rework of the EXCLUSIVE lock mode such that the OSAL lock is _not_ held during the join operation.
jphickey
added a commit
to jphickey/osal
that referenced
this issue
Dec 22, 2020
In the POSIX implementation, OS_TaskDelete was implemented in a deferred manner - the API call was a request, and the actual deletion occured sometime thereafter. This is a problem if the task is running code within a dynamically loaded module, and the intent is to delete the task so the module can be unloaded. In this case the app needs to be certain that the task has actually been deleted before unloading can be done safely. To do this requires use of pthread_join() on POSIX which confirms that the task has exited. However, this is a (potentially) blocking call, so to do this requires rework of the EXCLUSIVE lock mode such that the OSAL lock is _not_ held during the join operation.
jphickey
added a commit
that referenced
this issue
Jan 11, 2021
Fix #642, 645, 701, 702, 703 - OSAL global table management
jphickey
pushed a commit
to jphickey/osal
that referenced
this issue
Aug 10, 2022
Fix nasa#641, string operations on GCC9
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
The current POSIX implementation of
OS_TaskDelete()
usespthread_cancel()
. This is a cancellation request but the target thread continues to run until it hits a cancellation point. As a result, when the OS_TaskDelete() function returns, it is likely that the target thread is actually still running for a short period of time.But if the intent is to actually unload the module, as is done in the CFE "restart" and "reload" commands, it is critical to ensure that all uses/references to code within the to-be-unloaded module have actually been released. So it is important to make sure that the cancellation request has been executed and the task is actually deleted before proceeding to the
OS_ModuleUnload()
call.Currently with the POSIX implementation, there is no way to guarantee this, so this becomes a race condition during restart/reload operations on this platform.
To Reproduce
Not directly reproducible in current code - this race condition is currently masked by the fact that modules are loaded with
RTLD_GLOBAL
, so if the runtime loader still sees the module being referenced using its internal refcount, it doesn't actually unload it whendlclose()
is called. So in this case the task finishes up normally and there is no apparent problem.But when this is fixed - such as when using
RTLD_LOCAL
as suggested in #641 - then this race condition becomes a problem, and as a result a CFE reload or restart command sometimes triggers a segmentation fault in the event thatOS_ModuleUnload()
gets called before the task has fully been deleted.Expected behavior
OS_TaskDelete()
should ensure that the task has been fully removed, not just pending removal, before it returns to the caller.Suggestion to achieve this:
pthread_join()
to wait for the cancelled task to actually exit.System observed on:
Ubuntu 20.04
Requester Info
Joseph Hickey, Vantage Systems, Inc.
The text was updated successfully, but these errors were encountered: