Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBE bazel jobs are timing out #20628

Closed
chaodaiG opened this issue Jan 27, 2021 · 11 comments
Closed

RBE bazel jobs are timing out #20628

chaodaiG opened this issue Jan 27, 2021 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@chaodaiG
Copy link
Contributor

What happened:

There are several prow jobs started failing timing out after 2 hours, inspecting the logs and they were hanging at bazel test phase.

https://testgrid.k8s.io/sig-release-master-blocking#bazel-build-master
https://prow.k8s.io/?repo=kubernetes%2Ftest-infra&job=pull-test-infra-integration
https://prow.k8s.io/?repo=kubernetes%2Ftest-infra&job=pull-test-infra-bazel

One noticeable common factor among them is the usage of RBE

@chaodaiG chaodaiG added the kind/bug Categorizes issue or PR as related to a bug. label Jan 27, 2021
@chaodaiG
Copy link
Contributor Author

CC @fejta

@fejta
Copy link
Contributor

fejta commented Jan 27, 2021

Lots of CI uses bazel, and bazel almost always uses RBE.

@chaodaiG
Copy link
Contributor Author

Not using RBE in #20626 can make pull-test-infra-integration job pass, while the same job is now consistently timing out in other PRs https://prow.k8s.io/?repo=kubernetes%2Ftest-infra&job=pull-test-infra-integration

@CecileRobertMichon
Copy link
Member

running into this in #20630

@spiffxp spiffxp changed the title bazel test time out after 2 hours in prow RBE bazel jobs are timing out Jan 28, 2021
@alvaroaleman
Copy link
Member

This appears to block all merges into this repo, last merge was 9 hours ego and there is a bunch of merge-eligible PRs stuck in retesting: https://github.com/kubernetes/test-infra/pulls?q=is%3Apr+is%3Aopen+label%3Aapproved+label%3Algtm+-label%3A%22do-not-merge%2Fhold%22

@justaugustus
Copy link
Member

FYI @kubernetes/release-engineering

@chaodaiG
Copy link
Contributor Author

Current hypothesis is different bazel versions in k8s/test-infra and k8s/kubernetes caused confusion to RBE, and #20632 is an attempt to fix

@chaodaiG
Copy link
Contributor Author

@chaodaiG
Copy link
Contributor Author

The root cause of this problem was still not very clear, but according to RBE team it's possible that one of their dependencies had a bad release which was mitigated yesterday.

Checked again and all affected tests are passing now, so closing this bug.

/close

@k8s-ci-robot
Copy link
Contributor

@chaodaiG: Closing this issue.

In response to this:

The root cause of this problem was still not very clear, but according to RBE team it's possible that one of their dependencies had a bad release which was mitigated yesterday.

Checked again and all affected tests are passing now, so closing this bug.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants