-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race on mTestIndex. #7494
Merged
bzbarsky-apple
merged 1 commit into
project-chip:master
from
bzbarsky-apple:fix-test-index-race
Jun 9, 2021
Merged
Fix data race on mTestIndex. #7494
bzbarsky-apple
merged 1 commit into
project-chip:master
from
bzbarsky-apple:fix-test-index-race
Jun 9, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We could end up sending a message and getting a response to it before we ever incremented mTestIndex (if our call into NextTest() was on a thread other than the message thread). If that happened, we would end up running some subtest twice, and then later whenever we incrememented mTestIndex would end up skipping some subtest. Fixes project-chip#7493
pullapprove
bot
requested review from
andy31415,
chrisdecenzo,
Damian-Nordic,
hawk248,
jepenven-silabs,
msandstedt and
woody-apple
June 9, 2021 17:23
msandstedt
approved these changes
Jun 9, 2021
woody-apple
approved these changes
Jun 9, 2021
andy31415
approved these changes
Jun 9, 2021
mrjerryjohns
added a commit
to mrjerryjohns/connectedhomeip
that referenced
this pull request
Jun 10, 2021
This reverts commit ee0e403.
mrjerryjohns
added a commit
to mrjerryjohns/connectedhomeip
that referenced
this pull request
Jun 10, 2021
Problem: This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool: Change: - Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool - Removed usleep and global instance hacks. - Reverts changes in project-chip#7494 - Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong. - Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways. - Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization. - Pivots PlatformMgr::Shutdown() to not handle stopping the event queue, but only focus on cleaning up the stack objects. - Fixed up TestMdns as well along the way. Testing: - Enabled tsan using 'is_tsan' build arg and used that catch well over 10+ races, with not a single false-positive. - Ran through all the chip-tool command groups (pairing, IM, discover, testcluster, payload, etc) 10x each to ensure no regressions in functionality as well as ensuring clean shutdown with tsan.
mrjerryjohns
added a commit
to mrjerryjohns/connectedhomeip
that referenced
this pull request
Jun 11, 2021
Problem: This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool: Change: - Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool - Removed usleep and global instance hacks. - Reverts changes in project-chip#7494 - Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong. - Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways. - Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization. - Pivots PlatformMgr::Shutdown() to not handle stopping the event queue, but only focus on cleaning up the stack objects. - Fixed up TestMdns as well along the way. Testing: - Enabled tsan using 'is_tsan' build arg and used that catch well over 10+ races, with not a single false-positive. - Ran through all the chip-tool command groups (pairing, IM, discover, testcluster, payload, etc) 10x each to ensure no regressions in functionality as well as ensuring clean shutdown with tsan.
mrjerryjohns
added a commit
to mrjerryjohns/connectedhomeip
that referenced
this pull request
Jun 11, 2021
Problem: This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool: Change: - Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool - Removed usleep and global instance hacks. - Reverts changes in project-chip#7494 - Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong. - Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways. - Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization. - Pivots PlatformMgr::Shutdown() to not handle stopping the event queue, but only focus on cleaning up the stack objects. - Fixed up TestMdns as well along the way. Testing: - Enabled tsan using 'is_tsan' build arg and used that catch well over 10+ races, with not a single false-positive. - Ran through all the chip-tool command groups (pairing, IM, discover, testcluster, payload, etc) 10x each to ensure no regressions in functionality as well as ensuring clean shutdown with tsan.
mspang
pushed a commit
that referenced
this pull request
Jun 14, 2021
* Fix thread races in chip-tool Problem: This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool: Change: - Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool - Removed usleep and global instance hacks. - Reverts changes in #7494 - Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong. - Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways. - Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization. - Pivots PlatformMgr::Shutdown() to not handle stopping the event queue, but only focus on cleaning up the stack objects. - Fixed up TestMdns as well along the way. Testing: - Enabled tsan using 'is_tsan' build arg and used that catch well over 10+ races, with not a single false-positive. - Ran through all the chip-tool command groups (pairing, IM, discover, testcluster, payload, etc) 10x each to ensure no regressions in functionality as well as ensuring clean shutdown with tsan. * Restyler fixes * Forgot a file..
mkardous-silabs
pushed a commit
to mkardous-silabs/connectedhomeip
that referenced
this pull request
Jun 14, 2021
* Fix thread races in chip-tool Problem: This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool: Change: - Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool - Removed usleep and global instance hacks. - Reverts changes in project-chip#7494 - Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong. - Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways. - Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization. - Pivots PlatformMgr::Shutdown() to not handle stopping the event queue, but only focus on cleaning up the stack objects. - Fixed up TestMdns as well along the way. Testing: - Enabled tsan using 'is_tsan' build arg and used that catch well over 10+ races, with not a single false-positive. - Ran through all the chip-tool command groups (pairing, IM, discover, testcluster, payload, etc) 10x each to ensure no regressions in functionality as well as ensuring clean shutdown with tsan. * Restyler fixes * Forgot a file..
nikita-s-wrk
pushed a commit
to nikita-s-wrk/connectedhomeip
that referenced
this pull request
Sep 23, 2021
We could end up sending a message and getting a response to it before we ever incremented mTestIndex (if our call into NextTest() was on a thread other than the message thread). If that happened, we would end up running some subtest twice, and then later whenever we incrememented mTestIndex would end up skipping some subtest. Fixes project-chip#7493
nikita-s-wrk
pushed a commit
to nikita-s-wrk/connectedhomeip
that referenced
this pull request
Sep 23, 2021
* Fix thread races in chip-tool Problem: This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool: Change: - Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool - Removed usleep and global instance hacks. - Reverts changes in project-chip#7494 - Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong. - Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways. - Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization. - Pivots PlatformMgr::Shutdown() to not handle stopping the event queue, but only focus on cleaning up the stack objects. - Fixed up TestMdns as well along the way. Testing: - Enabled tsan using 'is_tsan' build arg and used that catch well over 10+ races, with not a single false-positive. - Ran through all the chip-tool command groups (pairing, IM, discover, testcluster, payload, etc) 10x each to ensure no regressions in functionality as well as ensuring clean shutdown with tsan. * Restyler fixes * Forgot a file..
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We could end up sending a message and getting a response to it before
we ever incremented mTestIndex (if our call into NextTest() was on a
thread other than the message thread). If that happened, we would end
up running some subtest twice, and then later whenever we
incrememented mTestIndex would end up skipping some subtest.
Fixes #7493
Problem
Random test failures due to the data race.
Change overview
Eliminate the data race.
Testing
Manually ran
scripts/tests/test_suites.sh
but realistically without some sort of race simulator or instrumenting the code to sleep before the increment on mainthread only it's hard to figure out how to test this reasonably.