Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. #36

sfc-gh-satherton · 2021-08-01T11:36:29Z

It seems to be the case that if a bundle is in an ended state then at the very least tests which are still running and later end with failure will not record their failures or log events into the database. I think tests still running which later end in success might also not be recorded correctly but I'm not sure.

This leads to final job states where

started > ended
started > pass + fail
ended != pass + fail

While over-run (starting more than max-runs tests) been significantly reduced, resolving #1, it is still the case that sometimes a few extra tests are launched. This is problematic because if any test is going to end with a Timeout failure, it will take the longest to run and will not complete until long after the first max-runs tests have completed with success.

To give a concrete example, with a limit of 10000 runs if the 500th run is going to run forever and end with Timeout, and the over-run is just 1 test, then the job state will reach started=10001 pass=10000 ended=10000 and be stopped before the failing test completes, after which the failing test will not be recorded. Running the same correctness package with a larger run limit such as 100000 would expose the failure because the bundle will still be active when the timeout failure occurs so it will be recorded.

The text was updated successfully, but these errors were encountered:

sfc-gh-kmakino · 2021-09-28T01:10:21Z

I'm not sure if this happens strictly when there are time-outed tests.
I think this is what's happening:
try_starting_test can return True to multiple agents if they ask concurrently. This will result started to overshoot. (In this case, if started=9999 and 2 agents calls try_starting_test, they both can start and started becomes 10001.
Then, when one of them finishes and ended reaches max_runs, it stops the ensemble. This will result that when the other agent finishes its test, it won't find the ensemble and won't record the result.

sfc-gh-satherton changed the title ~~Once the fail_fast limit is hit, any tests still running will not log or count their results.~~ Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. Sep 22, 2021

sfc-gh-kmakino mentioned this issue Sep 28, 2021

Avoid overshooting #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. #36

Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. #36

sfc-gh-satherton commented Aug 1, 2021 •

edited

Loading

sfc-gh-kmakino commented Sep 28, 2021

Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. #36

Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. #36

Comments

sfc-gh-satherton commented Aug 1, 2021 • edited Loading

sfc-gh-kmakino commented Sep 28, 2021

sfc-gh-satherton commented Aug 1, 2021 •

edited

Loading