Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect test termination condition in TestCommand subclasses. #7428

Merged
merged 1 commit into from
Jun 8, 2021

Conversation

bzbarsky-apple
Copy link
Contributor

The logic in NextTest was doing the following:

  1. Start the test at mTestIndex.
  2. Increment mTestIndex.
  3. If mTestCount == mTestIndex (i.e. no more tests to run), start shutdown (by
    calling SetCommandExitStatus, which unblocks the shutdown sequence)..

This failed when running the last test: we would start the test,
increment mTestIndex, then hit that mTestCount condition and
immediately start shutdown, before waiting for the last test to
complete.

The fix is to test for the "no more tests to run" condition on entry
to NextTest. This way the last test runs compeletely, and when it's
done and calls NextTest we go ahead and shut down.

Problem

Last test in a TestCommand was not being allowed to actually complete.

Change overview

Wait for it to complete.

Testing

Unfortunately, nothing in our CI right now catches the "test did not actually complete" situation. I was considering adding some more machinery around this, but wasn't quite sure how best to do that.

I did test that removing some of our raciness workarounds causes scripts/tests/test_suites.sh to crash ~10% of the time without this fix even if I ensure that shutdown is somewhat synchronized with message processing, because of the following sequence of events:

  1. Controller starts last test.
  2. Controller sends message.
  3. Controller waits for the message thread lock, then releases it (long story)
  4. Controller starts shutdown on controller's thread.
  5. While inside shutdown, the reply to the last test message arrives and processing of that reply crashes.

With this fix those crashes disappeared.

Any suggestions on better tests here would be much appreciated, of course, but it's a bit hard to test bugs in the test harness... ;)

The logic in NextTest was doing the following:

1. Start the test at mTestIndex.
2. Increment mTestIndex.
3. If mTestCount == mTestIndex (i.e. no more tests to run), start shutdown (by
   calling SetCommandExitStatus, which unblocks the shutdown sequence)..

This failed when running the last test: we would start the test,
increment mTestIndex, then hit that mTestCount condition and
immediately start shutdown, before waiting for the last test to
complete.

The fix is to test for the "no more tests to run" condition on entry
to NextTest.  This way the last test runs compeletely, and when it's
done and calls NextTest we go ahead and shut down.
@woody-apple
Copy link
Contributor

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this pull request Jun 7, 2021
@woody-apple
Copy link
Contributor

@jepenven-silabs ?

@andy31415 andy31415 merged commit adb2c83 into project-chip:master Jun 8, 2021
@bzbarsky-apple bzbarsky-apple deleted the fix-test-cluster branch June 8, 2021 18:12
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this pull request Jun 8, 2021
bzbarsky-apple added a commit that referenced this pull request Jun 9, 2021
…7434)

This change depends on
#7428 and
#7430 which fix
the problems properly.

Fixes #7408
Fixes #7409
Fixes #7410
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
…roject-chip#7428)

The logic in NextTest was doing the following:

1. Start the test at mTestIndex.
2. Increment mTestIndex.
3. If mTestCount == mTestIndex (i.e. no more tests to run), start shutdown (by
   calling SetCommandExitStatus, which unblocks the shutdown sequence)..

This failed when running the last test: we would start the test,
increment mTestIndex, then hit that mTestCount condition and
immediately start shutdown, before waiting for the last test to
complete.

The fix is to test for the "no more tests to run" condition on entry
to NextTest.  This way the last test runs compeletely, and when it's
done and calls NextTest we go ahead and shut down.
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants