-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve network outage in the CI lab #1800
Comments
1. Skip all MacOS builds 2. Show emergency banner bazelbuild#1800
1. Skip all MacOS builds 2. Show emergency banner #1800
Update: Most Macs are slowly coming back online. Due to the long duration of the outage we've accumulated a significant backlog of jobs :( |
Due to the outage we went without MacOS coverage for three days, which means that there is a significant backlog. This change enables MacOS jobs for high-priority jobs in order to help us clear the backlog. Hopefully we can enable MacOS for all jobs soon. bazelbuild#1800
Due to the outage we went without MacOS coverage for three days, which means that there is a significant backlog. This change enables MacOS jobs for high-priority jobs in order to help us clear the backlog. Hopefully we can enable MacOS for all jobs soon. #1800
Clarification: CI simply skips MacOS tasks for most of the pipelines, since we thought this would be less disruptive than failing. As a result, there's a chance that any change that is being merged now will cause MacOS breakages later. However, we hope that full MacOS testing will be possible at the end of the week - then we can find breakages via post-submit pipelines. |
/cc @keith @BalestraPatrick Sorry for not pinging you already. But this has already affected rules_python. Some commits have been merged without actually going through CI test. |
ah yea. in our case instead of filtering out the jobs we'd probably prefer they fail, but that might be unique to us since we disproportionally care about apple support. we have a branch protection on the overall job but since that ended up being green we didn't notice |
Yeah, for now you'll have to assume you don't have CI coverage at all for apple rules. I'm hoping we can get at least postsubmit working today. Will update here. |
@meteorcloudy @fweikert Any news regarding getting the macOS CI back to work? We'd like to merge a few PRs and cut new releases. |
Unfortunately, the issue is still ongoing... we are still working on fixing it |
any news? |
New network equipment was installed in the lab today. I hope that we have good news later this week. |
I'm seeing a dns failure on mac on presubmit. Is that known / expected, or a separate issue? $ git --git-dir /usr/local/var/bazelbuild/https---bazel-googlesource-com-bazel-git fetch origin master
fatal: unable to access 'https://bazel.googlesource.com/bazel.git/': Could not resolve host: bazel.googlesource.com |
The new network infrastructure has been installed, thus resolving the outage. Progress towards bazelbuild#1800
@brandjon Looks like a transient error while Yun was fixing the network. Can you please retry? |
The new network infrastructure has been installed, thus resolving the outage. Fixes #1800
Yeah, different failure mode now, instead of all shards having dns trouble.
|
I've found bk-imacpro-6 and bk-imacpro-4 especially prone to this specific failure. Maybe we should take them offline? As it is, them being online actually costs us more resources due to retries. |
Technically bazelbuild/bazel@2c51a0c should have fixed this problem, but it's hard to see whether that change was in the tree used for presubmit. Post-submit looks fine (minus two unrelated failures) |
Currently we've lost all of our Macs due to networking issues in the CI lab. Since it's late Friday in EU we're unlikely to see a resolution before Monday.
The text was updated successfully, but these errors were encountered: