Merge rerun results when --flaky-test-attempts is used #456

bootstraponline · 2019-01-14T20:38:42Z

flakyTestAttempts runs tests multiple times. Currently any failures will be reported as a failure in Flank, even if the test passes on retry.

Fetch the test rerun results, and consider the full history of a test to determine if it's passed or not.

gtroshin · 2019-01-18T15:19:24Z

Will it also fix Exited with code 1 in the end of Flank run when there is flaky-test-attempts and more retried tests passed than failed?

I went through the discussion in Slack. I have an idea of how it can works with xml report out:

A test case runs at the first time. If it passes, Flank proceeds to the next test.
If the test fails, Flank runs it again until it passes or fails x (flaky-test-attempts) times.
The most frequent outcome is recorded as the test result.
If the test result differs between test runs, the test is marked as unstable:

More passed attempts then failed: Passed - Unstable
More failed attempts then passed: Failed - Unstable

I have used similar flaky-test approach with another ui testing framework quite successful.

bootstraponline · 2019-01-18T15:23:07Z

Will it also fix Exited with code 1 in the end of Flank run when there is flaky-test-attempts and more retried tests passed than failed?

Yep! I think the idea is if the test passes at least once then it's considered passed.

valeraz · 2019-03-07T01:07:48Z

Current situation. If --flaky-test-attempts is set to >0 and any of the tests in the matrix fail, FTL will rerun the matrix and place the rerun results in a subdirectory called rerun_#. Here's a sample run with flaky-test-attempts=3 where one test always passed and one always failed:

results/2019-03-07_00-15-58.110000_IbGW
results/2019-03-07_00-15-58.110000_IbGW/JUnitReport.xml
results/2019-03-07_00-15-58.110000_IbGW/CostReport.txt
results/2019-03-07_00-15-58.110000_IbGW/HtmlErrorReport.html
results/2019-03-07_00-15-58.110000_IbGW/flank.yml
results/2019-03-07_00-15-58.110000_IbGW/MatrixResultsReport.txt
results/2019-03-07_00-15-58.110000_IbGW/shard_0
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait/test_result_1.xml
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_2
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_2/test_result_1.xml
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_3
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_3/test_result_1.xml
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_1
results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_1/test_result_1.xml
results/2019-03-07_00-15-58.110000_IbGW/matrix_ids.json

Right now, flank will aggregate all the result producing a JUnitReport.xml with 8 total tests and 4 failures. Instead of 8 result, we'd really like 2 results (one for each unique test executed).

Proposed solution:

Merge the redundant test results from each shard into the top-level xml result file (results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait/test_result_1.xml) in the following way:

if a test passed in at least one of the attempts (initial or rerun), mark it as passed
if it failed in all attempts, mark as failed. There is a possible edge case that a test can fail in multiple ways. To keep it simple, I propose to use the last failure result for the <failure> element.

If a test ultimately passed, but was flaky (i.e. failed either the first time or in at least once in one of the reruns), mark the test as flaky in the xml file like so: <testcase name="testPasses" classname="com.example.app.ExampleUiTest" time="0.0" "flaky="true">. A possible future optimization, we could report the flaky rate (number of failures/total attempts), but I'd like to keep the initial version simple.

An alternative to 2. could be to dump the flaky info to a separate file inside each shard, but to me, it seems like keeping it in one location would be desirable.

A couple of questions I haven't looked into yet:

will flank roll up the flaky="true" attributes to the top-level JUnitReport.xml? (if not, we'd need to do that)
what happens if repeatTests and flakyTestAttempts is used in the same config? I don't see a case where that would be needed. If so, we should probably add that validation to the doctor.

winterDroid · 2019-03-07T07:42:37Z

Sounds like a great idea. But I'd suggest valuing the JUnit XSD specifications, meaning I think the (2) approach would be more suitable, even though (1) would be more userfriendly. Maybe we could also dump the flaky results into the properties of the report to contain e.g. a comma-separated list of all flaky testcases?

bootstraponline · 2019-03-07T15:19:05Z

what happens if repeatTests and flakyTestAttempts is used in the same config?

I think we'd want to repeat them normally. It does seem like a weird use case.

will flank roll up the flaky="true" attributes to the top-level JUnitReport.xml? (if not, we'd need to do that)

I think modifying the report to include the data we need makes sense. The iOS JUnit is already custom from FTL so following JUnit XSD exactly isn't possible.

valeraz · 2019-03-07T22:54:26Z

what happens if repeatTests and flakyTestAttempts is used in the same config?

I think we'd want to repeat them normally. It does seem like a weird use case.

I thought the whole idea of repeatTests was to measure flakiness of a test. It would seem redundant to use that in conjunction with flakyTestsAttempts, since you'd want the raw data anyway. Or am I missing another use case?

valeraz · 2019-03-07T23:09:57Z

A couple more small questions

since we're now merging the xml from the rerun directories (e.g. results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_1/test_result_1.xml), do we still want to leave them in the results directory or delete them to avoid confusion?
do we tweak the name of the aggregate shard result xml file to imply that it was merged? (e.g. results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait/test_result_merged.xml)

bootstraponline · 2019-03-07T23:32:56Z

I thought the whole idea of repeatTests was to measure flakiness of a test.

That's my use case for it. The idea is simply repeating tests though. If I wanted to check for something else, like performance measurement to average the metrics, we may want both repeat and flaky test attempts.

since we're now merging the xml from the rerun directories (e.g. results/2019-03-07_00-15-58.110000_IbGW/shard_0/NexusLowRes-28-en-portrait-rerun_1/test_result_1.xml), do we still want to leave them in the results directory or delete them to avoid confusion?

We should probably leave them since the folder structure is intended to mirror what's on GCS. We're not deleting them on GCS so deleting them locally would be surprising I think.

2. do we tweak the name of the aggregate shard result xml file to imply that it was merged?

Yeah if we are generating the file by merging XML then naming it appropriately makes sense.

gtroshin · 2019-03-13T10:27:56Z

the reruns are executed in parallel

how it will consider the last rerun then?

bootstraponline · 2019-03-13T13:49:04Z

I think we'll download the results from all runs

bootstraponline added Feature Android iOS 5 labels Jan 14, 2019

bootstraponline mentioned this issue Feb 7, 2019

Fetch artifacts only from last run when using flakyTestAttempts #486

Closed

valeraz self-assigned this Mar 6, 2019

valeraz changed the title ~~Report on last run when using flakyTestAttempts~~ Merge rerun results when --flaky-test-attempts is used Mar 6, 2019

bootstraponline mentioned this issue Mar 15, 2019

Test retry #522

Merged

miguelslemos mentioned this issue Mar 16, 2019

Local result dir #523

Merged

bootstraponline closed this as completed in #522 Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge rerun results when --flaky-test-attempts is used #456

Merge rerun results when --flaky-test-attempts is used #456

bootstraponline commented Jan 14, 2019 •

edited

Loading

gtroshin commented Jan 18, 2019 •

edited

Loading

bootstraponline commented Jan 18, 2019

valeraz commented Mar 7, 2019 •

edited

Loading

winterDroid commented Mar 7, 2019

bootstraponline commented Mar 7, 2019

valeraz commented Mar 7, 2019 •

edited

Loading

valeraz commented Mar 7, 2019

bootstraponline commented Mar 7, 2019

gtroshin commented Mar 13, 2019

bootstraponline commented Mar 13, 2019

Merge rerun results when --flaky-test-attempts is used #456

Merge rerun results when --flaky-test-attempts is used #456

Comments

bootstraponline commented Jan 14, 2019 • edited Loading

gtroshin commented Jan 18, 2019 • edited Loading

bootstraponline commented Jan 18, 2019

valeraz commented Mar 7, 2019 • edited Loading

winterDroid commented Mar 7, 2019

bootstraponline commented Mar 7, 2019

valeraz commented Mar 7, 2019 • edited Loading

valeraz commented Mar 7, 2019

bootstraponline commented Mar 7, 2019

gtroshin commented Mar 13, 2019

bootstraponline commented Mar 13, 2019

bootstraponline commented Jan 14, 2019 •

edited

Loading

gtroshin commented Jan 18, 2019 •

edited

Loading

valeraz commented Mar 7, 2019 •

edited

Loading

valeraz commented Mar 7, 2019 •

edited

Loading