Fetching symlink inputs for top-level output #11536

ulfjack · 2020-06-02T12:20:48Z

Top-level output files can be symlinks to non-top-level output files, such as
in the case of a cc_binary linking against a .so built in the same build.

I reuse the existing mayInsensitivelyPropagateInputs() method, which was
originally introduced for action rewinding, but seems to have exactly the
intended semantics here as well.

Unfortunately, this requires a small change to the BlazeModule API to pass in
the full analysis result (so we can access the action graph, so we can read
the aformentioned flag).

I considered using execution-phase information on whether an output is a
symlink. However, at that point it's too late! The current design only allows
pulling an output file when its generating action runs. It would be better if
this was changed to pull output files independently of action execution. That
would avoid a lot of problems with the current design, such as #10902.

Fixes #11532.

Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921

ulfjack · 2020-06-02T12:22:48Z

@buchgr (who wrote the code originally)
I'd page Mark, but he's apparently not on GitHub... (because I'm abusing a flag in Action that is used by rewinding) (sorry @mschaller)

benjaminp · 2020-06-02T20:35:49Z

itym @anakanemison

ulfjack · 2020-06-03T10:24:18Z

@anakanemison is not linked to Google. Might want to do that?

anakanemison · 2020-06-04T03:30:38Z

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

+        ActionExecutionMetadata action =
+            (ActionExecutionMetadata) analysisResult.getActionGraph().getGeneratingAction((Artifact) actionInput);
+        if (action.mayInsensitivelyPropagateInputs()) {
+          filesToDownload.addAll(action.getInputs().toList());


Is it possible for there to be more than one symlink-ish action chained together? My concern is that this may be insufficient if the inputs to this action are outputs of other symlink actions. Rewinding recurses in such a case. Perhaps that would be overkill here.

Good catch.

anakanemison · 2020-06-04T03:32:43Z

src/main/java/com/google/devtools/build/lib/runtime/BlazeModule.java

@@ -282,13 +283,13 @@ public BuildOptions getDefaultBuildOptions(BlazeRuntime runtime) {
   * @param env the command environment
   * @param request the build request
   * @param buildOptions the build's top-level options
-   * @param configuredTargets the build's requested top-level targets as {@link ConfiguredTarget}s
+   * @param analysisResult the build's requested top-level targets as {@link ConfiguredTarget}s


this description deserves some rewording, given that the parameter's type changed.

Thanks, reworded.

anakanemison · 2020-06-04T03:36:04Z

Sorry for the delay. I'm not used to GitHub! I think I'm in the Google org on GitHub already.
But I wasn't in bazelbuild, until Tony extended me an invite tonight!

ulfjack · 2020-06-04T10:47:43Z

No worries, and thanks for the review. GitHub seems to think that @anakanemison is not in the Google org. For example, you can compare your profile page with Jakob's: @buchgr vs. @anakanemison

Did you add your @google.com address to your GitHub profile?

anakanemison · 2020-06-04T19:50:21Z

Thanks for pointing out my empty profile! I entered some information there. (Looks like I could say anything I wanted there..)

LGTM. I'll do the internal merging thing and respond here when the change is in.

janakdr · 2020-06-04T21:13:27Z

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

+
+  private static void fetchSymlinkDependenciesRecursively(
+      ActionGraph actionGraph, Set<ActionInput> builder, Artifact artifact) {
+    ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);


what happens if a top-level artifact is a tree artifact generated by a template action? I think that can happen?

If so, then it should be ok to ignore non-ActionExecutionMetadata instances, since templates shouldn't insensitively propagate their inputs (although I guess theoretically there could be a template action that just symlinked its inputs?). Could pull #mayInsensitivelyPropagateInputs() up to ActionAnalysisMetadata too.

Good catch. Right now that leads to a Bazel crash. Fixed!

The main issue right now are SymlinkAction and SymlinkTreeAction, but in general, any remotely run action can return a symlink in the open-source remote execution API. Therefore, this will eventually need to be rewritten to follow symlinks at execution time. The current design mixes up action execution and output fetching, and separating them should actually result in a nicer implementation, because we can then remove the code that injects remote execution flags into Skyframe.

So yeah, this is a short-term fix to get cc_binary to work properly, and there's more work needed.

Would you mind adding a TODO for that somewhere appropriate?

Sorry, now that I understand this better, could you add a test exhibiting this behavior? I assume it would be a Starlark action that created a symlink on the remote worker?

The behavior of creating symlinks on the remote worker seems very problematic to me. What happens if the action creates an intermediate file and then its final output is a symlink to that file? We'll just download a dangling symlink? Is there any chance of getting rid of this symlinking feature altogether?

ulfjack · 2020-06-19T10:42:01Z

Added a test for top-level template actions and fixed the resulting ClassCastException. PTAL.

janakdr · 2020-06-22T15:04:10Z

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

+
+  private static void fetchSymlinkDependenciesRecursively(
+      ActionGraph actionGraph, Set<ActionInput> builder, Artifact artifact) {
+    ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);


Would you mind adding a TODO for that somewhere appropriate?

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

ulfjack · 2020-08-06T12:14:41Z

Added a TODO as requested. Sorry for the delay - the last month was pretty hectic.

janakdr · 2020-08-06T14:08:03Z

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

    if (remoteOutputsMode != null && remoteOutputsMode.downloadToplevelOutputsOnly()) {
      Preconditions.checkState(actionContextProvider != null, "actionContextProvider was null");
      boolean isTestCommand = env.getCommandName().equals("test");
      TopLevelArtifactContext artifactContext = request.getTopLevelArtifactContext();
-      ImmutableSet.Builder<ActionInput> filesToDownload = ImmutableSet.builder();
-      for (ConfiguredTarget configuredTarget : configuredTargets) {
+      Set<ActionInput> filesToDownload = new HashSet<>();


can this be Artifact instead?

This includes test outputs which are not declared as artifacts, so no, not easily.

janakdr · 2020-08-06T15:18:30Z

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

+
+  private static void fetchSymlinkDependenciesRecursively(
+      ActionGraph actionGraph, Set<ActionInput> builder, Artifact artifact) {
+    ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);


Sorry, now that I understand this better, could you add a test exhibiting this behavior? I assume it would be a Starlark action that created a symlink on the remote worker?

The behavior of creating symlinks on the remote worker seems very problematic to me. What happens if the action creates an intermediate file and then its final output is a symlink to that file? We'll just download a dangling symlink? Is there any chance of getting rid of this symlinking feature altogether?

ulfjack · 2020-08-19T10:53:21Z

Bazel currently errors out if the remote executor returns a dangling symlink, and also if it returns a symlink when build-without-the-bytes is enabled.

ulfjack · 2020-08-19T12:13:26Z

Not quite: it returns an error if an action generates a symlink and --remote_download_minimal is active, but not if --remote_download_toplevel is active. I added a test for this.

ulfjack · 2020-08-19T12:16:05Z

Note that genrules can also generate symlinks on the remote machine.

janakdr

Thanks for the test!

Just the question about the declaration of Set and my new request for the logline.

janakdr · 2020-08-19T12:46:16Z

src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

+    }
+    ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);
+    if (action.mayInsensitivelyPropagateInputs()) {
+      List<Artifact> inputs = action.getInputs().toList();


Would you mind adding a warning-level logline here if the list is >5 elements or so? I don't think we expect such actions to insensitively propagate inputs, and if there's a bug, we could slow down significantly due to nested set expansion.

Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921

Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427

Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921 Closes bazelbuild#11536. Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427 PiperOrigin-RevId: 332939828

googlebot added the cla: yes label Jun 2, 2020

aiuto requested review from gregestren and janakdr June 4, 2020 01:52

anakanemison reviewed Jun 4, 2020

View reviewed changes

anakanemison approved these changes Jun 4, 2020

View reviewed changes

janakdr reviewed Jun 4, 2020

View reviewed changes

janakdr approved these changes Jun 22, 2020

View reviewed changes

ulfjack force-pushed the fix-cpp-toplevel branch from 3fc2282 to a467bf8 Compare August 6, 2020 12:08

janakdr reviewed Aug 6, 2020

View reviewed changes

ulfjack force-pushed the fix-cpp-toplevel branch from 056a6ba to e007e02 Compare August 19, 2020 10:53

janakdr approved these changes Aug 19, 2020

View reviewed changes

ulfjack added 2 commits September 16, 2020 17:14

Fix feedback

f2c455c

Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427

ulfjack force-pushed the fix-cpp-toplevel branch from 633082b to f2c455c Compare September 16, 2020 15:14

bazel-io closed this in 07e152e Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetching symlink inputs for top-level output #11536

Fetching symlink inputs for top-level output #11536

ulfjack commented Jun 2, 2020

ulfjack commented Jun 2, 2020 •

edited

Loading

benjaminp commented Jun 2, 2020

ulfjack commented Jun 3, 2020

anakanemison Jun 4, 2020

ulfjack Jun 4, 2020

anakanemison Jun 4, 2020

ulfjack Jun 4, 2020

anakanemison commented Jun 4, 2020

ulfjack commented Jun 4, 2020

anakanemison commented Jun 4, 2020

janakdr Jun 4, 2020

ulfjack Jun 19, 2020

janakdr Jun 22, 2020

janakdr Aug 6, 2020

ulfjack commented Jun 19, 2020

janakdr Jun 22, 2020

ulfjack commented Aug 6, 2020

janakdr Aug 6, 2020

ulfjack Sep 16, 2020

janakdr Aug 6, 2020

ulfjack commented Aug 19, 2020

ulfjack commented Aug 19, 2020

ulfjack commented Aug 19, 2020

janakdr left a comment

janakdr Aug 19, 2020

ulfjack Sep 16, 2020

Fetching symlink inputs for top-level output #11536

Fetching symlink inputs for top-level output #11536

Conversation

ulfjack commented Jun 2, 2020

ulfjack commented Jun 2, 2020 • edited Loading

benjaminp commented Jun 2, 2020

ulfjack commented Jun 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anakanemison commented Jun 4, 2020

ulfjack commented Jun 4, 2020

anakanemison commented Jun 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ulfjack commented Jun 19, 2020

Choose a reason for hiding this comment

ulfjack commented Aug 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ulfjack commented Aug 19, 2020

ulfjack commented Aug 19, 2020

ulfjack commented Aug 19, 2020

janakdr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ulfjack commented Jun 2, 2020 •

edited

Loading