Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark testset as being flaky #38213

Conversation

jonas-schulze
Copy link
Contributor

as an alternative to #38211.

@vchuravy vchuravy requested a review from vtjnash October 28, 2020 19:28
@jonas-schulze
Copy link
Contributor Author

jonas-schulze commented Oct 29, 2020

Do you have a procedure for logging and monitoring these kinds of failures over several builds/PRs/... to get an overview about the percentage of failures? E.g. write some file of specific format/location, that is picked up by CI (similar to results of coverage analysis); add success or failure to prometheus when run in CI; or similar. This could be helpful to discover if flaky tests get better (which could be a sign of them having been fixed by accident) or worse.

@vchuravy
Copy link
Member

Do you have a procedure for logging and monitoring these kinds of failures over several builds/PRs/... to get an overview about the percentage of failures? E.g. write some file of specific format/location, that is picked up by CI; add success or failure to prometheus when run in CI; or similar. This could be helpful to discover if flaky tests get better (which could be a sign of them having been fixed by accident) or worse.

No we don't, we would need the test set reporting to be understood by our CI system, which right now is buildbot. It seems deeper integration is possible with buildkite which we are looking at to move to. Reporting tests to something like prometheus might be interesting, especially if we also get test duration.

cc: @staticfloat @christopher-dG

@christopher-dG
Copy link
Member

Buildkite has by default a "reliability" metric that is basically a pass/fail ratio of entire jobs, but to get per-testset resolution we'd need to do a custom plugin. Seems doable though.

@Keno
Copy link
Member

Keno commented Oct 29, 2020

We shouldn't run tests that have unknown failure modes. If they introduce memory or other corruption, we could still be tracking down mystery failures.

@vtjnash
Copy link
Member

vtjnash commented Oct 30, 2020

This is semi-isolated in that, while the race conditions it tests definitely have been seen to corrupt memory, they are newly minted addprocs machines and so probably won't bring down the master node.

@Keno
Copy link
Member

Keno commented Oct 30, 2020

Do we trust the error handling in Distributed sufficiently that arbitrary memory corruption in the worker node will not cause strange failures on the master node?

@vtjnash vtjnash closed this Oct 30, 2020
@vtjnash vtjnash reopened this Oct 30, 2020
@vtjnash
Copy link
Member

vtjnash commented Oct 30, 2020

I don't trust anyone

@jonas-schulze
Copy link
Contributor Author

I guess this PR isn't needed anymore. Shall the discussion about monitoring test set failures be documented elsewhere?

@musm
Copy link
Contributor

musm commented Dec 11, 2020

I guess this PR isn't needed anymore. Shall the discussion about monitoring test set failures be documented elsewhere?

I think our current idea is to report them under the CI tag (if you don't find an issue corresponding to the failure you see in CI, please open one up).

@musm musm closed this Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants