Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote: Fix "file not found" error when remote cache is changed from enabled to disabled. #14252

Closed
wants to merge 4 commits into from

Conversation

coeuvre
Copy link
Member

@coeuvre coeuvre commented Nov 10, 2021

When use BwtB, intermediate outputs are not downloaded. If a following build disables remote cache, a "file not found" error will be thrown if an action doesn't have its inputs downloaded in previous build because those files cannot be downloaded in this build.

This change fix this issue by:

  1. Do not load remote metadata from action cache if --experimental_action_cache_store_output_metadata is set and remote cache is disabled.
  2. Invalidate action nodes if previous build use BwtB and remote cache is changed from enabled to disabled.

Fixes #13882.

@coeuvre coeuvre requested a review from a team as a code owner November 10, 2021 07:06
@google-cla google-cla bot added the cla: yes label Nov 10, 2021
Copy link
Contributor

@alexjski alexjski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend splitting that into 2 logically separate changes as in the description.

# See https://github.com/bazelbuild/bazel/issues/13882.

mkdir -p a
cat > a/BUILD <<EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use EOF and not need to escape the $ in the content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Test that BwtB does cause build failure if remote cache is disabled in a following build.
# See https://github.com/bazelbuild/bazel/issues/13882.

mkdir -p a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: -p not needed in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--verbose_failures \
//a:consumer >& $TEST_log || fail "Failed to populate the cache"

bazel clean || fail "Failed to clean"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: bazel clean >& "${TEST_log}" || fail ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--remote_download_toplevel \
--verbose_failures \
//a:consumer >& $TEST_log || fail "Failed to download outputs without remote metadata"
(! [[ -f bazel-bin/a/a.txt ]] && ! [[ -f bazel-bin/a/b.txt ]]) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we reverse that to [[ -f bazel-bin/a/a.txt ]] || [[ -f bazel-bin/a/b.txt ]] && fail ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(! [[ -f bazel-bin/a/a.txt ]] && ! [[ -f bazel-bin/a/b.txt ]]) \
|| fail "Expected outputs of producer are not downloaded without remote metadata"

# build without remote cache without remote metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the build which fails without your change? The reason I am asking is because this test may be merging 2 test cases in here. In general, it is a little hard to follow what is arrange-act-assert in here.

Meta-comment -- is there a way to run the remote tests in a Java test using BuildIntegrationTestCase? In those, it is much easier to parameterize etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into 2 test cases.

Meta-comment -- is there a way to run the remote tests in a Java test using BuildIntegrationTestCase? In those, it is much easier to parameterize etc.

Not sure. These remote tests run a remote worker process which is a real remote server. Do we prefer BuildIntegrationTestCase over shell in general? If so, I can try to make it work with remote tests and write new tests there in future PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally do. I wouldn't tell you to rewrite these tests, but you may want to prefer Java-based ones when applicable in the future. Sent you a document talking about that offline.

Copy link
Member Author

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. I will split this PR when importing.

# Test that BwtB does cause build failure if remote cache is disabled in a following build.
# See https://github.com/bazelbuild/bazel/issues/13882.

mkdir -p a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# See https://github.com/bazelbuild/bazel/issues/13882.

mkdir -p a
cat > a/BUILD <<EOF
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--verbose_failures \
//a:consumer >& $TEST_log || fail "Failed to populate the cache"

bazel clean || fail "Failed to clean"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--remote_download_toplevel \
--verbose_failures \
//a:consumer >& $TEST_log || fail "Failed to download outputs without remote metadata"
(! [[ -f bazel-bin/a/a.txt ]] && ! [[ -f bazel-bin/a/b.txt ]]) \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(! [[ -f bazel-bin/a/a.txt ]] && ! [[ -f bazel-bin/a/b.txt ]]) \
|| fail "Expected outputs of producer are not downloaded without remote metadata"

# build without remote cache without remote metadata
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into 2 test cases.

Meta-comment -- is there a way to run the remote tests in a Java test using BuildIntegrationTestCase? In those, it is much easier to parameterize etc.

Not sure. These remote tests run a remote worker process which is a real remote server. Do we prefer BuildIntegrationTestCase over shell in general? If so, I can try to make it work with remote tests and write new tests there in future PRs.


bazel clean || fail "Failed to clean"
function test_download_toplevel_when_turn_remote_cache_off_with_metadata() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 test cases differ by 1 option only -- why not put the test case as a separate helper function and call that twice with different parameters (Bash tests parameterization). Example here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# download top level outputs without remote metadata
# download top level outputs
bazel build \
--remote_cache=grpc://localhost:${worker_port} \
--remote_download_toplevel \
--verbose_failures \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: --verbose_failures is unlikely needed/we could add that to the blazerc for the whole test if that aids debugging failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(! [[ -f bazel-bin/a/a.txt ]] && ! [[ -f bazel-bin/a/b.txt ]]) \
|| fail "Expected outputs of producer are not downloaded without remote metadata"

# build without remote cache without remote metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally do. I wouldn't tell you to rewrite these tests, but you may want to prefer Java-based ones when applicable in the future. Sent you a document talking about that offline.

@@ -3286,8 +3250,7 @@ EOF
bazel build \
--remote_cache=grpc://localhost:${worker_port} \
--remote_download_toplevel \
--experimental_action_cache_store_output_metadata \
--verbose_failures \
$extra_build_flags \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect bash linter may not like that -- your best bet may be to use the "$@" array directly or declare and use extra_build_flags as an array and use like that (personally, I recommend "$@"):

local -r extra_build_flags=("$@")
...
"${extra_build_flags[@]}"

bazel-io pushed a commit that referenced this pull request Nov 16, 2021
…he is changed from enabled to disabled.

Part of #14252.

PiperOrigin-RevId: 410243297
@bazel-io bazel-io closed this in c9b7e22 Nov 17, 2021
coeuvre added a commit to coeuvre/bazel that referenced this pull request Nov 24, 2021
…he is changed from enabled to disabled.

Part of bazelbuild#14252.

PiperOrigin-RevId: 410243297
coeuvre added a commit to coeuvre/bazel that referenced this pull request Nov 24, 2021
meteorcloudy pushed a commit that referenced this pull request Nov 24, 2021
… from enabled to disabled. (#14321)

* Remote: Invalidate actions if previous build used BwoB and remote cache is changed from enabled to disabled.

Part of #14252.

PiperOrigin-RevId: 410243297

* In `ArchivedTreeArtifact`, make the assumption that the relative output path is always a single segment, and use this to serialize less data.

This assumption holds because the origin of the relative output path (i.e. `bazel-out`) is `BlazeDirectories#getRelativeOutputPath`, which always returns a single-segment string. Instead of passing that around, just extract it from the tree artifact's exec path.

Additionally, the public `ArchivedTreeArtifact#create` method is removed in order to enforce a consistent naming pattern for all instances.

The codec supports custom derived tree roots even though there is currently no such serialization in skyframe (all serialized instances have the default `:archived_tree_artifacts`, but it was easy enough to be flexible).

PiperOrigin-RevId: 406382340

* Remote: Don't load remote metadata from action cache if remote cache is disabled.

Part of #14252.

Closes #14252.

PiperOrigin-RevId: 410448656

Co-authored-by: jhorvitz <jhorvitz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build failure from missing files when using --experimental_action_cache_store_output_metadata
2 participants