Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not all outputs were created or valid (Remote caching, all platforms) #8125

Closed
excitoon opened this issue Apr 24, 2019 · 23 comments
Closed

not all outputs were created or valid (Remote caching, all platforms) #8125

excitoon opened this issue Apr 24, 2019 · 23 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) stale Issues or PRs that are stale (no activity for 30 days) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@excitoon
Copy link
Contributor

excitoon commented Apr 24, 2019

0.23.2

	ERROR: /some/path/BUILD:25:1: output 'some/outpu.t' was not created
	ERROR: /some/path/BUILD:25:1: not all outputs were created or valid

@meteorcloudy
This issue happens every time when cache server (https://github.com/neitanod/toxy) terminates connection without an answer:

[toxy]$ ./main -p 9999 -t leave:asap
2019/04/24 07:18:31 Starting reverse proxy for leave:asap
2019/04/24 07:19:02 unsupported protocol scheme "leave"
@excitoon
Copy link
Contributor Author

Basically the problem is zero answer for action result is perfectly fine (??) and that leads to 0 outputs of an action, which is downloading without an error.

ef7fe3d fixes that, but the problem is more general.

I can guess Bazel shall have some kind of tolerance to network errors. Maybe some checksums for cache entries?

@meteorcloudy
Copy link
Member

/cc @buchgr Our remote caching/execution expert.

@meteorcloudy meteorcloudy added team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug labels Apr 24, 2019
@buchgr
Copy link
Contributor

buchgr commented Apr 24, 2019

@excitoon

Basically the problem is zero answer for action result is perfectly fine (??) and that leads to 0 outputs of an action, which is downloading without an error.

Yes it's fine for an action to not produce an output file, but then it must also not declare an output file or else you'll see this error.

I can guess Bazel shall have some kind of tolerance to network errors. Maybe some checksums for cache entries?

We do have that, but it looks like that this is not the problem here. Would have a reproducer for me to take a look at?

@excitoon
Copy link
Contributor Author

@buchgr

git clone https://github.com/neitanod/toxy
go run toxy/main.go -p 9999 -t bad &
bazel build //some:thing --spawn_strategy=standalone --remote_http_cache=http://localhost:9999

@buchgr
Copy link
Contributor

buchgr commented Apr 24, 2019

@excitoon how does the //some:thing target look like?

@excitoon
Copy link
Contributor Author

excitoon commented Apr 24, 2019

@buchgr Any cacheable target with outputs. E.g. for bazel itself:

[chebotarev@some_machine bazel]$ bazel build //src:bazel --spawn_strategy=standalone --remote_http_cache=http://localhost:9999
INFO: Invocation ID: 42a78889-b7bf-4617-b9d7-e65ec085bc84
INFO: Analysed target //src:bazel (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
[0 / 7] [-----] BazelWorkspaceStatusAction stable-status.txt
URI (GET): localhost:9999/ac/6953a6c58d4dc26c7160d766f3293f0d78621719e78c2cba85e7c006ec9257a1
ERROR: /home/chebotarev/bazel/src/tools/android/java/com/google/devtools/build/android/BUILD:21:1: output 'src/tools/android/java/com/google/devtools/build/android/all_android_tools_deploy.jar' was not created
ERROR: /home/chebotarev/bazel/src/tools/android/java/com/google/devtools/build/android/BUILD:21:1: not all outputs were created or valid
[1 / 10] 4 actions running
    Building deploy jar .../devtools/build/lib/bazel/BazelServer_deploy.jar; 0s remote-cache
    Building deploy jar .../devtools/build/android/all_android_tools_deploy.jar; 0s remote-cache
    .../java/com/google/devtools/coverageoutputgenerator:all_lcov_merger_tools; 0s remote-cache
    Building deploy jar .../build/importdeps/ImportDepsChecker_deploy.jar; 0s remote-cache
URI (GET): localhost:9999/ac/9d51dc88f1684479abed944587c2ce01605e75ab215a49e4c068645f9e7bb527
Target //src:bazel failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.320s, Critical Path: 0.07s
INFO: 1 process: 1 remote cache hit.
FAILED: Build did NOT complete successfully

@excitoon
Copy link
Contributor Author

@buchgr about a target without outputs and without output file, what is the caching strategy for it? Shall Bazel ask cache for that absence of files?

@buchgr
Copy link
Contributor

buchgr commented Apr 24, 2019

@excitoon aww I see what you mean - that's indeed a bug. thanks!

This issue happens every time when cache server (https://github.com/neitanod/toxy) terminates connection without an answer:

This statement is not fully correctly. what toxy seems to do is send a 200 OK response with an empty body. So that's a badly behaving server. That's different from the server just terminating the connection.

I think the proper fix would be to add a check that the generated outputs are a super set of the declared outputs.

@buchgr buchgr added the P2 We'll consider working on this in future. (Assignee optional) label Apr 24, 2019
@excitoon
Copy link
Contributor Author

excitoon commented Apr 24, 2019 via email

@buchgr
Copy link
Contributor

buchgr commented Apr 24, 2019

@excitoon it's totally legal for a command to create more output files than declared in Bazel (i.e. debug output).

@excitoon
Copy link
Contributor Author

excitoon commented Apr 24, 2019 via email

@excitoon
Copy link
Contributor Author

That thing that 0 bytes protobuf is a valid entity is indeed interesting.

@buchgr
Copy link
Contributor

buchgr commented Apr 24, 2019

@excitoon it's a bit unfortunate. 0 bytes is a valid protobuf in that all fields in the protobuf will have its default values. We check the int exit_code field and the default value of an int is 0 which also happens to be the exit code signaling success...

@excitoon
Copy link
Contributor Author

@buchgr Is e530f68 fine enough?

@buchgr
Copy link
Contributor

buchgr commented Apr 25, 2019

Mostly yes. Open a PR (with a test) and we can discuss it further? I added a comment on the commit. Thanks!

@EricBurnett
Copy link

Changing the API to require a positively set field (so an empty proto isn't valid) is also bazelbuild/remote-apis#6 . I don't think it can happen until V3, since it'd be a breaking change, but would be a long-term defense against this.

@buchgr
Copy link
Contributor

buchgr commented May 23, 2019

I believe Bazel should be able to handle it client side independent of the API. To the best of my knowledge if the output is of type Artifact it's a declared and thus required output by Bazel.

@ob
Copy link
Contributor

ob commented Sep 24, 2019

Any status on this, we are hitting this problem when using bazel-remote with Bazel 0.25.1. I still don't know how an empty file got into the ac, but:

% find ac -size 0
ac/05/0520dc913c0d5fcf9791bb16affdeef7ea13ee5f8d6b42924b89c84a65d40fc6

When bazel asks for that action, bazel-remote happily hands it with size 0 which causes the build to fail because the output wasn't created.

@mostynb
Copy link
Contributor

mostynb commented Nov 6, 2019

@ob: you might want to try upgrading bazel-remote, I added some ActionResult validation when implementing the gRPC interface, and then refactored the code so that the HTTP interface also has this validation.

@mostynb
Copy link
Contributor

mostynb commented Nov 8, 2019

Duh, this part of the validation I added to bazel-remote is obviously out of spec (followup in buchgr/bazel-remote#121).

@MikhailTymchukFT
Copy link

Starting with 4.0.0 release, bazel hangs when using multiplex-worker strategy. It fails immediately with the message described in #0 post when using other strategies like sandboxed or local.

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 26, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) stale Issues or PRs that are stale (no activity for 30 days) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants