-
Notifications
You must be signed in to change notification settings - Fork 431
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Crash in callback group pointer vector iterator
1. Mutually exclusive callback group hangs The root cause for this issue is due to a combination between the multithreaded executor and the mutually exclusive callback group used for all the ROS topics. When the executor collects all the references to the subscriptions, timers and services, it skips the mutually exclusive callback_groups which are currently locked (ie: being processed by another thread). This cause the resulting waitset to only contain the guard pointers. If there is no activity on those guards, the thread will wait for work forever in the get_next_executable and block all other threads. The resolution is to simply add a timeout for the multithreaded get_next_executable call ensuring the calling thread will eventually return. 2. Memory leak in callback group weak reference vectors There is leak in the callback group weak reference vectors that occurs when a weak reference becomes invalid (subscription is dropped, service deleted, etc). The now obsolete weak pointer reference is never deleted from the callback group pointer vector causing the leak. The resolution of this problem is implemented by scanning and deleting expired weak pointer at the time of insertion of a new weak pointer into the callback group vectors. This approach is the lowest computational cost to purging obsolete weak pointers. 3. Crash in iterator for callback group pointer vectors This problem exists because a reference to the callback group pointer vector is provided as a return value to facilitate loop iterator. This is a significant crash root cause with a multithreaded executor where a thread is able to add new subscription to the callback group. The crash is caused by a concurrent modification of the weak pointer vector while another thread is iterating over that same vector resulting in a crash. Testing: These changes where implemented and tested using a test application which creates / publish / deletes thousands of topics (up to 100,000) from a separate standalone thread while the ROS2 layer is receiving data on the topics deleted. The muiltithreaded was setup to contain 10 separate executor threads on a single mutually exclusive callback group containing thousands of topics. issue: #813 Signed-off-by: Guillaume Autran <gautran@clearpath.ai>
- Loading branch information
1 parent
9be3e08
commit 96daf7a
Showing
6 changed files
with
141 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.