Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automating Flaky Test Detection #3575

Open
siddhantdange opened this issue Oct 24, 2023 · 15 comments
Open

Automating Flaky Test Detection #3575

siddhantdange opened this issue Oct 24, 2023 · 15 comments

Comments

@siddhantdange
Copy link

siddhantdange commented Oct 24, 2023

Flaky tests unnecessarily waste developer and CI time, impact product stability, and reduce confidence in tests. It seems that the https://github.com/nodejs/node repository suffers from flaky tests, as shown by the context below.

Context:


BuildPulse is a platform for detecting flaky tests, reporting on impact, mitigating against that impact, and ultimately fixing flaky tests. We rely on open source ourselves, so in an effort to give back, we offer BuildPulse to open-source projects for free (up to a certain volume on test results per day).

I'd like to propose using BuildPulse to replace the current flaky test workflow, which seems to be through https://github.com/nodejs/reliability. We can replace this repository by making it easy to search for known flaky tests and build context (links to test flakiness occurrences, error logs, commits, etc).

We can also enhance your current workflow through ncu-ci integrations, GitHub Issues integrations, pull request comments, reports, notifications, badges, and more.

I'm willing to do the integration myself, which largely consists of 2 steps:

  1. Configuring the test framework to export test results to JUnit XML
  2. Upload result to BuildPulse using our buildpulse-action (for GitHub Actions) or test-reporter (Jenkins)

If the integration is helpful, BuildPulse would appreciate a testimonial down the road :)

Happy to answer any questions and interested in what next steps might look like!

@benjamingr
Copy link
Member

We track flaky tests in https://github.com/nodejs/reliability/

@tniessen
Copy link
Member

FWIW, I believe that detecting flaky tests hasn't been a big issue. "setting them as flaky to make our CI green again", as the referenced tweet put it, is the easy part. Finding people who are willing to fix them, on the other hand, has been rather difficult.

Also, this was merged only recently: #3056

@anonrig
Copy link
Member

anonrig commented Oct 25, 2023

We track flaky tests in https://github.com/nodejs/reliability/

IMHO, github issues are not a good way of tracking issues that persist in different days. For example, I'd be super interested in learning how long does it take a flaky test to appear or fixed when introduced into the codebase. I don't know if the proposed service supports it or not, but I believe there is no harm in trying a new solution. Maybe a new product might create the dopamine effect we need to promote fixing the issues.

@mcollina
Copy link
Member

I would not replace the current infrastructure with a 3rd party system straight away, but I'm not opposed to add it in a non-destructive way, evaluate it, and see how it reacts.

I'm not sure having that data available would give us anything more useful than our current system, but I'd be happy to be proven wrong.

Something important to consider is that sending data to the 3rd party must be disabled during security releases while CI is locked down.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 26, 2023

  1. Configuring the test framework to export test results to JUnit XML

I think for Jenkins this does not need additional work, at the end of every test run we already run tap2junit -i test.tap -o out/junit/test.xml.

Upload result to BuildPulse using our buildpulse-action (for GitHub Actions) or test-reporter (Jenkins)

I think that can be done in parallel to reliability report generation, which is done as a daily job scrawling data from Jenkins. I agree with @tniessen that the difficult part is actually fixing the flakes. I think we have more than enough open issues about flaky tests that contain whatever data a service like BuildPulse can collect, but from what I've observed that doesn't help much in terms of actually getting them fixed, some of the flaky test issues in the core repo are years old and still reproduce from time to time, no matter they are marked flaky or not, and we never know if it's caused by a real bug or if it's caused by platform issues, because we just don't have enough volunteers to actually look into them.

I think when marking tests as flaky, we need to actually spend some brain energy into figuring out why it is flaky and why an immediate solution is difficult to come up with & the impact is small enough so we have to put on this band-aid for now. We also need to check that it is not actually a regression (e.g. nodejs/node#49852) (IMO the best solution to potential regression is revert ASAP and only reland when the flake is addressed). That part probably can't be accomplished by automation.

@nodejs nodejs deleted a comment from jessbennett Oct 26, 2023
@nodejs nodejs deleted a comment from jessbennett Oct 26, 2023
@joyeecheung
Copy link
Member

joyeecheung commented Oct 27, 2023

Also I think what would be really useful is having automation to bisect regressions for newly appeared flakes (ideally on affected platforms only) to find the commit that introduce a flake. (Maybe old ones too, they can be bisected just fine). Even better if this can associate flakes with infra updates somehow in case it is caused by infra updates. This would be very helpful in addressing flakes like nodejs/node#49852 (for that it took some human intelligence to find the suspicious commit even though we did not verify it via bisect and just got a lucky guess). I worked with a similar automation in V8 before and I found it amazing. I am not sure if BuildPause supports that though. If it can only collect information based on past test failures but cannot do actual bisects, that’s still not much of an improvement compared to what we already have.

@siddhantdange
Copy link
Author

siddhantdange commented Oct 27, 2023

Hi everyone - read through the comments so far, and would like to address some concerns.

I agree that detection on it's own is not sufficient in solving the underlying problem - your GitHub issues track a subset of known flaky tests already. So what additional data can we provide?

We can show (among others, these just directly address identified concerns):

  • disruptiveness of flaky tests
  • time consumed by flaky test
  • when the flaky test was first and last observed (with links to commits + builds)
  • error messages when the test flaked
  • charts & reporting around history of these metrics to track trends

These metrics, alerting, and PR comments make sure flaky tests are tracked, prioritized correctly, and fixed.

BuildPulse determines test flakiness if we see a passing and failing result for that test, but no code has changed. We determine if the code has changed by comparing git tree SHAs. Because of this continuous monitoring, we can identify flaky tests as they come up.

@mhdawson
Copy link
Member

I would not replace the current infrastructure with a 3rd party system straight away, but I'm not opposed to add it in a non-destructive way, evaluate it, and see how it reacts.

+1 from me on that approach.

I think the main question is if we have somebody from build or who has jenkins access who will volunteer to help with the effort as my guess is that we'll need update build jobs to send the data.

@joyeecheung
Copy link
Member

So far I think no one objects to try it out in addition to what we already have. So we just need volunteers to integrate it with Jenkins and/or GitHub actions and see how it goes.

@shnooshnoo
Copy link

hey all, i'd like to improve Windows builds of Node CI and fix flaky tests where possible, and really curious how BuildPulse can help here. So was just wondering what's the current status of this?

@shnooshnoo
Copy link

@siddhantdange what would we need to have some kind of MVP of this? Anything I can help you with here?

@siddhantdange
Copy link
Author

siddhantdange commented Jan 23, 2024

@shnooshnoo 3 steps!

  1. Sign up to generate keys
  2. Configure the tests to export a Junit XML file
  3. Use our github action or binary to send results to BuildPulse using the keys from Step 1

Additionally, our onboarding docs can be found here.

Happy to help here as well, if you can point me to the correct workflow!

@StefanStojanovic
Copy link
Contributor

FYI @nodejs/build I will meet with @siddhantdange on Tuesday, February 6 at 16:00 CET. Anyone who wants is welcome to join. I'll post a meeting summary here afterward.

@StefanStojanovic
Copy link
Contributor

A quick summary after the meeting:

  • @siddhantdange did a quick demo for me, showcasing the tools. It looks good and it could potentially be useful to us.
  • Since I've already tried the tool, we went over some questions I had and cleared them up (eg. how to keep additional data with results like platform, machine, etc.).
  • Since for my PoC, I'm using my local fork, I'm missing some things such as the settings panel which is only available for organization-owned repos. After I set up everything, we can easily change to nodejs/node and see additional settings
  • The plan, for now, is to set up part of Windows machines to send results to BuildPulse and let it run for some time so we can see what the dashboards will look like. This will require changing node-test-binary-windows-js-suites and node-test-binary-windows-native-suites but only to add additional code to check if the machine is configured to send results and send them if it is. After making a PoC we will reevaluate. The way I see it, if this moves forward, we'd probably want to upload results from jenkins-workspace machines as that's where all of the results accumulate in the end and we'd not have to change all of the sub-jobs generating the junit.xml files.

StefanStojanovic added a commit to JaneaSystems/build that referenced this issue Mar 18, 2024
This commit adds initial BuildPulse support by optionally uploading
test results there. If machine is not configured to connect to
BuildPulse, or if any condition is not met, the data will not be sent.
If the upload fails, the test job will just continue.

For now, this is a PoC and only Windows machines can upload results.
If we decide to move forward with it, other platforms will be added.

Refs: nodejs#3575
StefanStojanovic added a commit to JaneaSystems/build that referenced this issue Mar 20, 2024
This commit adds initial BuildPulse support by optionally uploading
test results there. If machine is not configured to connect to
BuildPulse, or if any condition is not met, the data will not be sent.
If the upload fails, the test job will just continue.

For now, this is a PoC and only Windows machines can upload results.
If we decide to move forward with it, other platforms will be added.

Refs: nodejs#3575
@StefanStojanovic
Copy link
Contributor

@nodejs/build a PR enabling BuildPulse integration on part of Windows machines in PRs and daily builds is open here.

StefanStojanovic added a commit that referenced this issue Mar 27, 2024
This commit adds initial BuildPulse support by optionally uploading
test results there. If machine is not configured to connect to
BuildPulse, or if any condition is not met, the data will not be sent.
If the upload fails, the test job will just continue.

For now, this is a PoC and only Windows machines can upload results.
If we decide to move forward with it, other platforms will be added.

Refs: #3575
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants