Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel 0.29 contains a regression with Remote Build Execution backend #9284

Closed
v-mr opened this issue Aug 29, 2019 · 13 comments
Closed

Bazel 0.29 contains a regression with Remote Build Execution backend #9284

v-mr opened this issue Aug 29, 2019 · 13 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@v-mr
Copy link

v-mr commented Aug 29, 2019

Description of the problem / feature request:

Bazel 0.29.0 (#8572) has regression with Remote Build Execution backend.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

ERROR: /test/BUILD:205:5: failed (Exit 34). Note: Remote connection/protocol failed with: execution failed FAILED_PRECONDITION: Precondition check failed.

What operating system are you running Bazel on?

Linux 4.19.37-5+deb10u1rodete1-amd64 #1 SMP Debian 4.19.37-5+deb10u1rodete1 (2019-07-22 > 2018) x86_64 GNU/Linux

What's the output of bazel info release?

release 0.29.0

@katre katre self-assigned this Aug 29, 2019
@katre katre added P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug labels Aug 29, 2019
@katre
Copy link
Member

katre commented Aug 29, 2019

I am investigating with @v-mr

@lizan
Copy link

lizan commented Aug 29, 2019

We're seeing same issue in envoyproxy/envoy#8074, seems this only happens when --nocache_test_results is passed to bazel. I can reproduce it with --nocache_test_results but not without it.

@katre
Copy link
Member

katre commented Aug 29, 2019

Thank you for the data point, @lizan.

@v-mr has given me a repoduction case and I am bisecting now.

@katre
Copy link
Member

katre commented Aug 29, 2019

@lizan It's possible that the difference with --nocache_test_results is that when it is not set the tests aren't executed, which seems to be where the failure is.

@lizan
Copy link

lizan commented Aug 29, 2019

It's possible that the difference with --nocache_test_results is that when it is not set the tests aren't executed, which seems to be where the failure is.

@katre I doubt so because I changed the our RBE image SHA at same time, which should revoke all cache in the first time when I ran without --nocache_test_results.

@katre
Copy link
Member

katre commented Aug 29, 2019

Bisect has eventually told me that the culprit is e4ccba4. I have no idea why that is, but reverting it from master causes my reproduction to be fixed.

I'm going to debug this further to try and determine why this change causes this failure.

@katre
Copy link
Member

katre commented Aug 30, 2019

That commit is the one that caused the action key to change, which is apparently what causes the bug to be visible. It's not the commit that actually causes the error.

@buchgr
Copy link
Contributor

buchgr commented Aug 30, 2019

Trying to repro. PRECONDITION_FAILED hints at inputs not being uploaded.

@buchgr
Copy link
Contributor

buchgr commented Aug 30, 2019

I believe the root cause is this line. We should unconditionally upload inputs: https://source.bazel.build/bazel/+/master:src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnRunner.java;l=250?q=RemoteSpawnRunner

I ll send a fix. Sorry about this :(

@buchgr
Copy link
Contributor

buchgr commented Aug 30, 2019

Fix: #9287

@buchgr
Copy link
Contributor

buchgr commented Aug 30, 2019

Here's a patch that applies cleanly on 0.29.0: buchgr@c475d7d

katre pushed a commit that referenced this issue Sep 3, 2019
Action inputs would not be uploaded to the CAS for targets tagged
with 'no-remote-cache' or 'no-cache'.

The regression was introduced by 8860c3e

Closes  #9287.

PiperOrigin-RevId: 266358305
@brandjon
Copy link
Member

brandjon commented Sep 3, 2019

Came across this issue from a buildkite "Emergency" message. Since this issue is closed (and the message refers to a solution "later today"), can the message be removed?

@katre
Copy link
Member

katre commented Sep 4, 2019

@brandjon The core issue won't be fixed until 0.29.1 is released, hopefully tomorrow.

bazel-io pushed a commit that referenced this issue Sep 10, 2019
Baseline: 6c5ef53

Cherry picks:

   + 338829f:
     Fix retrying of SocketTimeoutExceptions in HttpConnector
   + 14651cd:
     Fallback to next urls if download fails in HttpDownloader
   + b7d300c:
     Fix incorrect stdout/stderr in remote action cache. Fixes #9072
   + 9602176:
     Automated rollback of commit
     0f0a0d5.
   + da557f9:
     Windows: fix "bazel run" argument quoting
   + ef8b6f6:
     Return JavaInfo from java proto aspects.
   + 209175f:
     Revert back to the old behavior of not creating a proto source
     root for generated .proto files.
   + 644060b:
     Fix PatchUtil for parsing special patch format
   + 067040d:
     Put the removal of the legacy repository-relative proto path
     behind the --incompatible_generated_protos_in_virtual_imports
     flag.
   + 76ed014:
     repository mapping lookup: convert to canonical name first
   + f791df0:
     Release 0.29.0 (2019-08-28)
   + 2c04648:
     Fix git_repository rule to support fetching a commit on a tag
   + 9e1d65a:
     Fix a serious regression in remote execution. Fixes #9284
   + 8b0bfaf:
     Include cc configure headers in the cache key
   + 5c02b92:
     Make --workspace_status_command work with "Builds without the
     Bytes".
   + a0e3bb2:
     Remove support for authentication and .netrc

This release contains contributions from many people at Google, as well as Artem Zinnatullin.
exoson pushed a commit to exoson/bazel that referenced this issue Sep 25, 2019
Baseline: 6c5ef53

Cherry picks:

   + 338829f:
     Fix retrying of SocketTimeoutExceptions in HttpConnector
   + 14651cd:
     Fallback to next urls if download fails in HttpDownloader
   + b7d300c:
     Fix incorrect stdout/stderr in remote action cache. Fixes bazelbuild#9072
   + 9602176:
     Automated rollback of commit
     0f0a0d5.
   + da557f9:
     Windows: fix "bazel run" argument quoting
   + ef8b6f6:
     Return JavaInfo from java proto aspects.
   + 209175f:
     Revert back to the old behavior of not creating a proto source
     root for generated .proto files.
   + 644060b:
     Fix PatchUtil for parsing special patch format
   + 067040d:
     Put the removal of the legacy repository-relative proto path
     behind the --incompatible_generated_protos_in_virtual_imports
     flag.
   + 76ed014:
     repository mapping lookup: convert to canonical name first
   + f791df0:
     Release 0.29.0 (2019-08-28)
   + 2c04648:
     Fix git_repository rule to support fetching a commit on a tag
   + 9e1d65a:
     Fix a serious regression in remote execution. Fixes bazelbuild#9284
   + 8b0bfaf:
     Include cc configure headers in the cache key
   + 5c02b92:
     Make --workspace_status_command work with "Builds without the
     Bytes".
   + a0e3bb2:
     Remove support for authentication and .netrc

This release contains contributions from many people at Google, as well as Artem Zinnatullin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

5 participants