Mark testset as being flaky #38213

jonas-schulze · 2020-10-28T16:51:17Z

as an alternative to #38211.

jonas-schulze · 2020-10-29T17:28:48Z

Do you have a procedure for logging and monitoring these kinds of failures over several builds/PRs/... to get an overview about the percentage of failures? E.g. write some file of specific format/location, that is picked up by CI (similar to results of coverage analysis); add success or failure to prometheus when run in CI; or similar. This could be helpful to discover if flaky tests get better (which could be a sign of them having been fixed by accident) or worse.

vchuravy · 2020-10-29T17:34:42Z

Do you have a procedure for logging and monitoring these kinds of failures over several builds/PRs/... to get an overview about the percentage of failures? E.g. write some file of specific format/location, that is picked up by CI; add success or failure to prometheus when run in CI; or similar. This could be helpful to discover if flaky tests get better (which could be a sign of them having been fixed by accident) or worse.

No we don't, we would need the test set reporting to be understood by our CI system, which right now is buildbot. It seems deeper integration is possible with buildkite which we are looking at to move to. Reporting tests to something like prometheus might be interesting, especially if we also get test duration.

cc: @staticfloat @christopher-dG

christopher-dG · 2020-10-29T17:39:16Z

Buildkite has by default a "reliability" metric that is basically a pass/fail ratio of entire jobs, but to get per-testset resolution we'd need to do a custom plugin. Seems doable though.

Keno · 2020-10-29T23:00:43Z

We shouldn't run tests that have unknown failure modes. If they introduce memory or other corruption, we could still be tracking down mystery failures.

vtjnash · 2020-10-30T01:17:27Z

This is semi-isolated in that, while the race conditions it tests definitely have been seen to corrupt memory, they are newly minted addprocs machines and so probably won't bring down the master node.

Keno · 2020-10-30T01:29:47Z

Do we trust the error handling in Distributed sufficiently that arbitrary memory corruption in the worker node will not cause strange failures on the master node?

vtjnash · 2020-10-30T15:06:01Z

I don't trust anyone

jonas-schulze · 2020-11-13T13:06:40Z

I guess this PR isn't needed anymore. Shall the discussion about monitoring test set failures be documented elsewhere?

musm · 2020-12-11T20:09:47Z

I guess this PR isn't needed anymore. Shall the discussion about monitoring test set failures be documented elsewhere?

I think our current idea is to report them under the CI tag (if you don't find an issue corresponding to the failure you see in CI, please open one up).

Mark testset as being flaky

f4d5438

vchuravy requested a review from vtjnash October 28, 2020 19:28

vtjnash closed this Oct 30, 2020

vtjnash reopened this Oct 30, 2020

musm closed this Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark testset as being flaky #38213

Mark testset as being flaky #38213

jonas-schulze commented Oct 28, 2020

jonas-schulze commented Oct 29, 2020 •

edited

Loading

vchuravy commented Oct 29, 2020

christopher-dG commented Oct 29, 2020

Keno commented Oct 29, 2020

vtjnash commented Oct 30, 2020

Keno commented Oct 30, 2020

vtjnash commented Oct 30, 2020

jonas-schulze commented Nov 13, 2020

musm commented Dec 11, 2020

Mark testset as being flaky #38213

Mark testset as being flaky #38213

Conversation

jonas-schulze commented Oct 28, 2020

jonas-schulze commented Oct 29, 2020 • edited Loading

vchuravy commented Oct 29, 2020

christopher-dG commented Oct 29, 2020

Keno commented Oct 29, 2020

vtjnash commented Oct 30, 2020

Keno commented Oct 30, 2020

vtjnash commented Oct 30, 2020

jonas-schulze commented Nov 13, 2020

musm commented Dec 11, 2020

jonas-schulze commented Oct 29, 2020 •

edited

Loading