Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--rerun-fails do not execute remaining tests after rerun test passes #278

Closed
viralkpanchal opened this issue Sep 6, 2022 · 7 comments
Closed
Labels
bug Something isn't working enhancement New feature or request

Comments

@viralkpanchal
Copy link

Hi,

Thank you for your contribution to the gotestsum package. It is extremely useful.

We started using the --rerun-fails flag to rerun potentially unstable test suites to prevent false failures in our CI/CD pipeline.

However, we identified 2 issues with this:

  1. Not all tests are getting executed when the previously failing test passes. For example, if the test suite has 30 tests, and the 5th test fails, then gotestsum retries the 5th test, and if it passes, then test execution stops there.
  2. --format short-verbose does not take effect for the failed tests. We get the standard-verbose output for the failed test case. As a result console log shows that the test suite has failed, but in reality, the test was passed in the retry.

Below is our usage scenario. Each service has n number of tests (Service-1 has 10 tests, Service-2 has 30 tests, and Service-3 has 4 tests).

Here is an example of gotestsum usage in Makefile BEFORE --rerun-fails

GO_TEST = gotestsum --format short-verbose -- -count=1 -failfast

test_all: service1 service2 service3

service1:
   $(GO_TEST) ./services/service1/scenario
service2:
   $(GO_TEST) ./services/service2/scenario
service3:
   $(GO_TEST) ./services/service3/scenario

Here is an example of gotestsum usage in Makefile AFTER --rerun-fails

define go_test
   gotestsum --rerun-fails --packages="$(1)" --format short-verbose -- -count=1 -failfast
endef

test_all: service1 service2 service3

service1:
   $(call go_test,./services/service1/scenario)
service2:
   $(call go_test,./services/service2/scenario)
service3:
   $(call go_test,./services/service3/scenario)

We want to achieve the following outcome with respect to our usecase above:

  1. If any test case in service-2 fails after rerun, whole test execution should stop and service-3 tests should not be executed. (i.e. -failfast should take effect). It was working before we started using --rerun-fails.
  2. If the 5th test case in service-2 passes, after rerun from failed 1st time (i.e. failed first time, but passes second time), then the rest of the 25 tests should continue execution. Currently this is a problem with the use of --rerun-fails
  3. Log output should not be printed for the test case that passed in the re-run.

Appreciate your help 🙏

@dnephin dnephin added the bug Something isn't working label Sep 6, 2022
@dnephin
Copy link
Member

dnephin commented Sep 6, 2022

Thank you for the bug report!

If any test case in service-2 fails after rerun, whole test execution should stop and service-3 tests should not be executed. (i.e. -failfast should take effect). It was working before we started using --rerun-fails.

This one might be difficult to fix. Because of some limitations of the -run flag (which I think may have been fixed now), I had to re-run each test individually. This means that -failfast being passed to go does not work anymore. Each test is being run with a separate go test call.

You might be able to use the gotestsum flag --max-fails, although there's a good chance that setting that to 1 is incomatible with --rerun-fails. I haven't tried it myself.

If we can confirm that upstream problems with -run were fixed, then it may be possible to re-run all the tests for a package at once, which will get --failfast working again.

If the 5th test case in service-2 passes, after rerun from failed 1st time (i.e. failed first time, but passes second time), then the rest of the 25 tests should continue execution. Currently this is a problem with the use of --rerun-fails

This is the expected behaviour of --rerun-fails. It's designed to skip rerunning all the passed tests, and only re-run the failed ones. I guess that should be documented more clearly. The reason it works this way is that in very large projects it is too expensive to re-run everything again, and if there are lots of flakes you might never get a fully clean run.

If you want to re-run everything I think maybe --rerun-fails won't work for you. It should be pretty easy to re-run all tests with a short bash script. You could either call gotestsum from the script, or have gotestsum run the script with --raw-command. Both should work I think.

It's possible that behaviour could be added to gotestsum as well. It seems like a less interesting feature though because it's pretty easy to accomplish with a short script (where as the current behaviour is much more involved and would be difficult to script).

Log output should not be printed for the test case that passed in the re-run.

This one I don't understand yet. From what I can tell it should use the same formatter for everything. Are you sure it's the formatter, and not the summary?

I would expect to see the failed test output in the summary at the end of the run (even if the command returns success because the re-runs passed). I would not want to hide this by default, because I think it's important to see that output.

If the output is indeed coming from the summary at the end, then I think maybe the change to not using --rerun-fails, and instead using a loop may solve this problem as well. It would let you hide the output from previous runs when the last one passes.

@dnephin dnephin added the enhancement New feature or request label Sep 6, 2022
@dnephin
Copy link
Member

dnephin commented Sep 6, 2022

It sounds like most likely the --rerun-fails flag doesn't work for your use case, something like this might:

for run in {0..3}; do
    if gotestsum  --format short-verbose --jsonfile=last-run.log --hide-summary=all  ... -- -failfast ./...; then
        exit 0
    fi
done

# print the output for the last run
gotestsum --raw-command --format short-verbose -- cat last-run.log
rm -f last-run.log
exit 1

@viralkpanchal
Copy link
Author

This is the expected behaviour of --rerun-fails. It's designed to skip rerunning all the passed tests, and only re-run the failed ones. I guess that should be documented more clearly. The reason it works this way is that in very large projects it is too expensive to re-run everything again, and if there are lots of flakes you might never get a fully clean run.

Perhaps I didn't explain properly, I meant to say that remaining 25 unexecuted tests do not run at all after 5th test passes upon rerun.
For example, there are tests from Test1, Test2, Test3, Test4, Test5, Test6.... Test30.
When test execution starts, I see logout as per below:

Starting Service2TestSuite Execution...
PASS Test1
PASS Test2
PASS Test3
PASS Test4
FAIL Test5

Starting Service2TestSuite Execution...
PASS Test5

Service2TestSuite PASSED..

Test 6 to Test 30 do not even execute at all.

@viralkpanchal
Copy link
Author

It sounds like most likely the --rerun-fails flag doesn't work for your use case, something like this might:

for run in {0..3}; do
    if gotestsum  --format short-verbose --jsonfile=last-run.log --hide-summary=all  ... -- -failfast ./...; then
        exit 0
    fi
done

# print the output for the last run
gotestsum --raw-command --format short-verbose -- cat last-run.log
rm -f last-run.log
exit 1

I will try this today and let you know.

@dnephin
Copy link
Member

dnephin commented Sep 6, 2022

Test 6 to Test 30 do not even execute at all.

Ahhh, I understand now. That's because of -failfast. That's a good point that I did not realize. -failfast is incompatible with --rerun-fails because --rerun-fails requires all the tests to run.

I think gotestsum should refuse to rerun if -failfast is used with --rerun-fails.

@dnephin
Copy link
Member

dnephin commented Oct 2, 2022

#280 adds that error so that gotestsum refuses to run when -failfast is used with --rerun-fails.

Anything else we should address for this issue? Did the suggestion in the comment above work out for your use case?

@dnephin
Copy link
Member

dnephin commented Nov 26, 2022

I think we've fixed the bug here, so I'm going to close this issue. If there's something I've missed please do leave a comment and I can re-open it. Thanks!

@dnephin dnephin closed this as completed Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants