Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race on mTestIndex. #7494

Merged

Conversation

bzbarsky-apple
Copy link
Contributor

We could end up sending a message and getting a response to it before
we ever incremented mTestIndex (if our call into NextTest() was on a
thread other than the message thread). If that happened, we would end
up running some subtest twice, and then later whenever we
incrememented mTestIndex would end up skipping some subtest.

Fixes #7493

Problem

Random test failures due to the data race.

Change overview

Eliminate the data race.

Testing

Manually ran scripts/tests/test_suites.sh but realistically without some sort of race simulator or instrumenting the code to sleep before the increment on mainthread only it's hard to figure out how to test this reasonably.

We could end up sending a message and getting a response to it before
we ever incremented mTestIndex (if our call into NextTest() was on a
thread other than the message thread).  If that happened, we would end
up running some subtest twice, and then later whenever we
incrememented mTestIndex would end up skipping some subtest.

Fixes project-chip#7493
@bzbarsky-apple bzbarsky-apple merged commit ee0e403 into project-chip:master Jun 9, 2021
@bzbarsky-apple bzbarsky-apple deleted the fix-test-index-race branch June 9, 2021 19:54
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this pull request Jun 10, 2021
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this pull request Jun 10, 2021
Problem:

This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool:

Change:

- Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool
- Removed usleep and global instance hacks.
- Reverts changes in project-chip#7494
- Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong.
- Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways.
- Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization.
- Pivots PlatformMgr::Shutdown() to not handle stopping the event queue,
  but only focus on cleaning up the stack objects.
- Fixed up TestMdns as well along the way.

Testing:

- Enabled tsan using 'is_tsan' build arg and used that catch well over
  10+ races, with not a single false-positive.
- Ran through all the chip-tool command groups (pairing, IM, discover,
  testcluster, payload, etc) 10x each to ensure no regressions in
  functionality as well as ensuring clean shutdown with tsan.
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this pull request Jun 11, 2021
Problem:

This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool:

Change:

- Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool
- Removed usleep and global instance hacks.
- Reverts changes in project-chip#7494
- Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong.
- Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways.
- Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization.
- Pivots PlatformMgr::Shutdown() to not handle stopping the event queue,
  but only focus on cleaning up the stack objects.
- Fixed up TestMdns as well along the way.

Testing:

- Enabled tsan using 'is_tsan' build arg and used that catch well over
  10+ races, with not a single false-positive.
- Ran through all the chip-tool command groups (pairing, IM, discover,
  testcluster, payload, etc) 10x each to ensure no regressions in
  functionality as well as ensuring clean shutdown with tsan.
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this pull request Jun 11, 2021
Problem:

This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool:

Change:

- Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool
- Removed usleep and global instance hacks.
- Reverts changes in project-chip#7494
- Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong.
- Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways.
- Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization.
- Pivots PlatformMgr::Shutdown() to not handle stopping the event queue,
  but only focus on cleaning up the stack objects.
- Fixed up TestMdns as well along the way.

Testing:

- Enabled tsan using 'is_tsan' build arg and used that catch well over
  10+ races, with not a single false-positive.
- Ran through all the chip-tool command groups (pairing, IM, discover,
  testcluster, payload, etc) 10x each to ensure no regressions in
  functionality as well as ensuring clean shutdown with tsan.
mspang pushed a commit that referenced this pull request Jun 14, 2021
* Fix thread races in chip-tool

Problem:

This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool:

Change:

- Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool
- Removed usleep and global instance hacks.
- Reverts changes in #7494
- Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong.
- Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways.
- Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization.
- Pivots PlatformMgr::Shutdown() to not handle stopping the event queue,
  but only focus on cleaning up the stack objects.
- Fixed up TestMdns as well along the way.

Testing:

- Enabled tsan using 'is_tsan' build arg and used that catch well over
  10+ races, with not a single false-positive.
- Ran through all the chip-tool command groups (pairing, IM, discover,
  testcluster, payload, etc) 10x each to ensure no regressions in
  functionality as well as ensuring clean shutdown with tsan.

* Restyler fixes

* Forgot a file..
mkardous-silabs pushed a commit to mkardous-silabs/connectedhomeip that referenced this pull request Jun 14, 2021
* Fix thread races in chip-tool

Problem:

This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool:

Change:

- Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool
- Removed usleep and global instance hacks.
- Reverts changes in project-chip#7494
- Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong.
- Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways.
- Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization.
- Pivots PlatformMgr::Shutdown() to not handle stopping the event queue,
  but only focus on cleaning up the stack objects.
- Fixed up TestMdns as well along the way.

Testing:

- Enabled tsan using 'is_tsan' build arg and used that catch well over
  10+ races, with not a single false-positive.
- Ran through all the chip-tool command groups (pairing, IM, discover,
  testcluster, payload, etc) 10x each to ensure no regressions in
  functionality as well as ensuring clean shutdown with tsan.

* Restyler fixes

* Forgot a file..
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
We could end up sending a message and getting a response to it before
we ever incremented mTestIndex (if our call into NextTest() was on a
thread other than the message thread).  If that happened, we would end
up running some subtest twice, and then later whenever we
incrememented mTestIndex would end up skipping some subtest.

Fixes project-chip#7493
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
* Fix thread races in chip-tool

Problem:

This PR achieves the following to fix-up the various thread-races detected by tsan in chip-tool:

Change:

- Following the pattern of 'external synchronization', sprinkled LockChipStack() and UnlockChipStack() calls around key call sites that called into the stack from the various command logic in chip-tool
- Removed usleep and global instance hacks.
- Reverts changes in project-chip#7494
- Re-structured Command::Run to now have the bulk of the stack initialization and shutdown be managed before Run() is called in Commands::Run(), and an ExecutionContext object pointer be stashed inside the Command for convenient access. This reduces the changes of people writing new commands of getting stack initialization wrong.
- Instead of sometimes using chip::Controller::DeviceController and sometimes DeviceCommissioner, just used the latter in all commands since that is the super-set class anyways.
- Added a new 'StopEventLoopTask' that is thread-safe, that is needed to be called by application logic before DeviceController::Shutdown() can be called with external synchronization.
- Pivots PlatformMgr::Shutdown() to not handle stopping the event queue,
  but only focus on cleaning up the stack objects.
- Fixed up TestMdns as well along the way.

Testing:

- Enabled tsan using 'is_tsan' build arg and used that catch well over
  10+ races, with not a single false-positive.
- Ran through all the chip-tool command groups (pairing, IM, discover,
  testcluster, payload, etc) 10x each to ensure no regressions in
  functionality as well as ensuring clean shutdown with tsan.

* Restyler fixes

* Forgot a file..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of mTestIndex in chip-tool tests is racy
4 participants