Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal download breaks stable-status #8385

Closed
aherrmann opened this issue May 17, 2019 · 7 comments
Closed

minimal download breaks stable-status #8385

aherrmann opened this issue May 17, 2019 · 7 comments
Assignees
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@aherrmann
Copy link
Contributor

Description of the problem:

The minimal download flags

build --experimental_inmemory_jdeps_files
build --experimental_inmemory_dotd_files
build --experimental_remote_download_outputs=minimal

on Bazel 0.25.2 seem to break the stable-status feature. I'm observing the following error

ERROR: Failed to run workspace status command bazel_tools/workspace_status.sh: /home/aj/.cache/bazel/_bazel_aj/10d2a8446c92e5dc655eb18c02f185fb/execroot/com_github_digital_asset_daml/bazel-out/stable-status.txt (Permission denied)

(Forcefully removing the file temporarily resolves the problem.)

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Checkout digital-asset/daml@f6642ff and execute the following commands

. .envrc
bazel run //:buildifier
echo >> WORKSPACE
git add WORKSPACE
git commit -m 'dummy change'
bazel run //:buildifier

What operating system are you running Bazel on?

OpenSUSE Tumbleweed 20190510

What's the output of bazel info release?

release 0.25.2- (@non-git)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

Using Nix

Any other information, logs, or outputs that you want to share?

Not sure if related or not, but bazel run seems to be not working on some targets when using minimal downloads. E.g. on the same commit as above

$ bazel run damlc
...
ERROR: Non-existent or non-executable /home/aj/.cache/bazel/_bazel_aj/10d2a8446c92e5dc655eb18c02f185fb/execroot/com_github_digital_asset_daml/bazel
@ishikhman ishikhman added team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged labels May 21, 2019
@buchgr buchgr removed the untriaged label May 22, 2019
buchgr added a commit to buchgr/bazel that referenced this issue Jun 19, 2019
They need to be (over)writable by Bazel actions.
buchgr added a commit to buchgr/bazel that referenced this issue Jun 23, 2019
They need to be (over)writable by Bazel actions.

Closes bazelbuild#8678.

PiperOrigin-RevId: 254017428
siberex pushed a commit to siberex/bazel that referenced this issue Jul 4, 2019
They need to be (over)writable by Bazel actions.

Closes bazelbuild#8678.

PiperOrigin-RevId: 254017428
irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
They need to be (over)writable by Bazel actions.

Closes bazelbuild#8678.

PiperOrigin-RevId: 254017428
@mattklein123
Copy link

FYI I'm still hitting this issue on bazel 0.28.1. Is there anything I can do to provide more info? When I manually delete the workspace file everything works fine.

@mattklein123
Copy link

I verified that the above fix is in 0.28.1 so I guess it's some other issue. @buchgr LMK if you would like me to open a fresh issue on this.

@buchgr buchgr reopened this Aug 16, 2019
@buchgr
Copy link
Contributor

buchgr commented Aug 16, 2019

Do you have a reproducer or is it the same as the initial bug report?

@mattklein123
Copy link

@buchgr I don't have an exact reproducer, but the symptom is exactly the same as the original bug report. Let me know if there is any info I can provide. I've gotten around it right now by just deleting the stable-status.txt file before I run any build commands. I can remove that hack and help debug if you tell me what to look for.

@aherrmann
Copy link
Contributor Author

@buchgr I can still reproduce the issue with the same steps given in the original issue above on current master of digital-asset/daml@f11e09f60f. This is using

$ bazel version
Build label: 0.28.1- (@non-git)
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jan 1 00:00:00 1980 (315532800)
Build timestamp: 315532800
Build timestamp as int: 315532800

built with Nix as defined in the same repository.

@emfree
Copy link

emfree commented Aug 25, 2019

We ran into this as well. Not very familiar with the internals here, but it looks like the issue possibly goes like this:

  • The BazelWorkspaceStatusModule calls FileSystemUtils.maybeUpdateContent on the stable-status file:
    // Only update the stableStatus contents if they are different than what we have on disk.
    // This is to preserve the old file's mtime so that we do not generate an unnecessary dirty
    // file on each incremental build.
    FileSystemUtils.maybeUpdateContent(
    actionExecutionContext.getInputPath(stableStatus), printStatusMap(stableMap));
  • This in turn is a delete-plus-create on the file:
    if (!outputFile.isWritable()) {
    outputFile.delete();
    }
    writeContent(outputFile, newContent);
    }
  • But when you have experimental_remote_download_actions=minimal, the delete doesn't actually happen.
  • This is because the delete is being handled by the RemoteActionFileSystem, which looks to have a simple typo in its delete() method, fixed with this patch:
diff --git a/src/main/java/com/google/devtools/build/lib/remote/RemoteActionFileSystem.java b/src/main/java/com/google/devtools/build/lib/remote/RemoteActionFileSystem.java
index 0491c2d954..6157b413a5 100644
--- a/src/main/java/com/google/devtools/build/lib/remote/RemoteActionFileSystem.java
+++ b/src/main/java/com/google/devtools/build/lib/remote/RemoteActionFileSystem.java
@@ -69,7 +69,7 @@ public class RemoteActionFileSystem extends DelegateFileSystem {
   @Override
   public boolean delete(Path path) throws IOException {
     RemoteFileArtifactValue m = getRemoteInputMetadata(path);
-    if (m != null) {
+    if (m == null) {
       return super.delete(path);
     }
     return true;

@buchgr does this seem reasonable?

@buchgr
Copy link
Contributor

buchgr commented Aug 27, 2019

@emfree it does. nice catch! send me a patch? :)

buchgr added a commit to buchgr/bazel that referenced this issue Aug 29, 2019
Fixes bazelbuild#8385

Due to a typo in delete() stable-status.txt would not get deleted
between builds. More details in bazelbuild#8385.

Thanks @emfree for the debugging work and pointing out the root cause!
katre pushed a commit that referenced this issue Sep 6, 2019
Fixes #8385

Due to a typo in delete() stable-status.txt would not get deleted
between builds. More details in #8385.

Thanks @emfree for the debugging work and pointing out the root cause!

Closes #9280.

PiperOrigin-RevId: 266118848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants