Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS 11 Test Runners are sometimes very slow to allocate localhost ephemeral ports #6798

Closed
2 of 10 tasks
philipaconrad opened this issue Dec 17, 2022 · 2 comments
Closed
2 of 10 tasks

Comments

@philipaconrad
Copy link

philipaconrad commented Dec 17, 2022

Description

MacOS test runners sometimes randomly fail to allocate local ephemeral ports (binding on localhost, port 0) in a reasonable amount of time. (<10 seconds)

Background

The Open Policy Agent project has had MacOS test runner flakeouts for the last year or longer, which we've traced down to Github's MacOS runners not allocating ephemeral ports on localhost within a reasonable timeout window.

We reduced the frequency of test failures by setting our test timeouts dramatically higher, but even that has not completely eliminated the issue.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 18.04
  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Runner Image Version

  Image: macos-11
  Version: 20221204.1
  Included Software: https://github.com/actions/runner-images/blob/macOS-11/20221204.1/images/macos/macos-11-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/macOS-11%2F20221204.1

Failed Build Links

Below is all of the failed builds I could find in the last 50+ pages of our Actions tab. There are more failures before that, but we lack logs for those, so I couldn't filter out the relevant MacOS-specific failures any further back than mid-September of this year.

Is it regression?

Unknown, but likely not a new regression.

Expected behavior

MacOS test runners provide ephemeral/dynamic localhost ports to our Golang tests in <10 seconds.

Actual behavior

MacOS test runners occasionally fail to provide a port within a 10 second time frame, causing our tests to crash.

Repro steps

The failures are intermittent, and seem to be almost completely random. The failures may be tied to faulty hardware, as referenced in #3885.

I will periodically add new failures to the list above if that would be helpful.

@mikhailkoliada
Copy link
Contributor

Hello!

We are aware of this problem and we are working in order to make macOS agents better, but for now we are having some hardware limitations and nothing can be done from the images side :(. Feel free to reach us again if you have questions left.

@philipaconrad
Copy link
Author

@mikhailkoliada Is there any place that would be more appropriate to forward the failed jobs list to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants