Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. #36

Open
sfc-gh-satherton opened this issue Aug 1, 2021 · 1 comment

Comments

@sfc-gh-satherton
Copy link

sfc-gh-satherton commented Aug 1, 2021

It seems to be the case that if a bundle is in an ended state then at the very least tests which are still running and later end with failure will not record their failures or log events into the database. I think tests still running which later end in success might also not be recorded correctly but I'm not sure.

This leads to final job states where

started > ended
started > pass + fail
ended != pass + fail

While over-run (starting more than max-runs tests) been significantly reduced, resolving #1, it is still the case that sometimes a few extra tests are launched. This is problematic because if any test is going to end with a Timeout failure, it will take the longest to run and will not complete until long after the first max-runs tests have completed with success.

To give a concrete example, with a limit of 10000 runs if the 500th run is going to run forever and end with Timeout, and the over-run is just 1 test, then the job state will reach started=10001 pass=10000 ended=10000 and be stopped before the failing test completes, after which the failing test will not be recorded. Running the same correctness package with a larger run limit such as 100000 would expose the failure because the bundle will still be active when the timeout failure occurs so it will be recorded.

@sfc-gh-satherton sfc-gh-satherton changed the title Once the fail_fast limit is hit, any tests still running will not log or count their results. Once the fail_fast or runs limit is hit, any tests still running will not log or count their results. Sep 22, 2021
@sfc-gh-kmakino
Copy link
Contributor

I'm not sure if this happens strictly when there are time-outed tests.
I think this is what's happening:
try_starting_test can return True to multiple agents if they ask concurrently. This will result started to overshoot. (In this case, if started=9999 and 2 agents calls try_starting_test, they both can start and started becomes 10001.
Then, when one of them finishes and ended reaches max_runs, it stops the ensemble. This will result that when the other agent finishes its test, it won't find the ensemble and won't record the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants