-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark testset as being flaky #38213
Mark testset as being flaky #38213
Conversation
Do you have a procedure for logging and monitoring these kinds of failures over several builds/PRs/... to get an overview about the percentage of failures? E.g. write some file of specific format/location, that is picked up by CI (similar to results of coverage analysis); add success or failure to prometheus when run in CI; or similar. This could be helpful to discover if flaky tests get better (which could be a sign of them having been fixed by accident) or worse. |
No we don't, we would need the test set reporting to be understood by our CI system, which right now is buildbot. It seems deeper integration is possible with buildkite which we are looking at to move to. Reporting tests to something like prometheus might be interesting, especially if we also get test duration. |
Buildkite has by default a "reliability" metric that is basically a pass/fail ratio of entire jobs, but to get per-testset resolution we'd need to do a custom plugin. Seems doable though. |
We shouldn't run tests that have unknown failure modes. If they introduce memory or other corruption, we could still be tracking down mystery failures. |
This is semi-isolated in that, while the race conditions it tests definitely have been seen to corrupt memory, they are newly minted addprocs machines and so probably won't bring down the master node. |
Do we trust the error handling in Distributed sufficiently that arbitrary memory corruption in the worker node will not cause strange failures on the master node? |
I don't trust anyone |
I guess this PR isn't needed anymore. Shall the discussion about monitoring test set failures be documented elsewhere? |
I think our current idea is to report them under the |
as an alternative to #38211.