Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental bazel run + BwoB + --noexperimental_check_output_files = slow #20843

Closed
tjgq opened this issue Jan 10, 2024 · 3 comments
Closed

Incremental bazel run + BwoB + --noexperimental_check_output_files = slow #20843

tjgq opened this issue Jan 10, 2024 · 3 comments
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged

Comments

@tjgq
Copy link
Contributor

tjgq commented Jan 10, 2024

This repro was gently provided by @purkhusid in #6862:

https://github.com/purkhusid/bazel_runfiles_repro/blob/main/reproduce.sh

In short: target T1 has a dependency on T2, which has associated runfiles. First populate a disk cache with the result of building T1. Then, starting clean, first build T1 against the disk cache without the bytes (--remote_download_minimal), then run it. The result is that the runfiles for T2 are missing at runtime (as in, the runfile symlinks dangle). Disabling output checking (--noexperimental_check_output_files) is required to reproduce.

The issue is that building without the bytes works by lying to the Skyframe machinery about the existence of files on disk; instead, the files are materialized when either (1) a consuming action prefetches them, or (2) a top-level target forces its outputs to be produced. bazel run is not a build action, so it can't rely on prefetching; instead, it relies on the target being considered top-level even in MINIMAL mode (see RemoteOutputChecker). Disabling output checking prevents Skyframe from doing the dirtyness check on the outputs and realize they're not actually there, which would cause the target to be rebuilt.

There are two possible fixes:

  1. Add input prefetching to bazel run
  2. Make --noexperimental_check_output_files a no-op for bazel run commands.

(1) is more correct but much trickier to implement, since all of the normal action-running machinery has already been torn down by the time the command is run; it would likely involve reimplementing bazel run as a special action injected into the build graph. I think the pragmatic thing to do is implement (2) until there's more evidence that (1) is worthwhile. (This will make bazel run slightly slower in cases where output checking is truly unnecessary, but it's not obvious that that would lead to a major regression; we're also covered by the fact that --noexperimental_check_output_files is, well, experimental and subject to change behavior at any time.)

@tjgq tjgq added P1 I'll work on this now. (Assignee required) type: bug team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Jan 10, 2024
@brentleyjones
Copy link
Contributor

I would vote for (1), but I understand that the effort might not be perceived as worth it at this time. The thing I like about (1) is that it actually makes bazel run less special in regards to runfiles and such, since it's just another build action.

tjgq added a commit to tjgq/bazel that referenced this issue Jan 11, 2024
When building without the bytes, outputs are only materialized when either
(1) an action prefetches them, or (2) they've been explicitly requested. When
both --remote_download_minimal and --noexperimental_check_output_files are
set, this presents a problem for a build command followed by a run command
for the same target: the build command won't download the outputs and the run
command won't check for their presence, proceeding without them. A
non-incremental run command works, because the outputs are considered
top-level even in minimal mode.

This is only a problem for a run command because it exists outside of action
execution; it would be better to implement the run command by injecting a
specially crafted action into the build graph, but that will have to wait for
another day. The present fix would theoretically slow things down if output
checking is truly unnecessary, but I doubt that would cause a significant
regression in practice.

Fixes bazelbuild#20843.
tjgq added a commit to tjgq/bazel that referenced this issue Jan 11, 2024
When building without the bytes, outputs are only materialized when either
(1) an action prefetches them, or (2) they've been explicitly requested. When
both --remote_download_minimal and --noexperimental_check_output_files are
set, this presents a problem for a build command followed by a run command
for the same target: the build command won't download the outputs and the run
command won't check for their presence, proceeding without them. A
non-incremental run command works, because the outputs are considered
top-level even in minimal mode.

This is only a problem for a run command because it exists outside of action
execution, and doesn't get a chance to prefetch inputs; it would have been
better to implement the run command by injecting a specially crafted action
into the build graph, but that will have to wait for another day. The present
fix might theoretically slow things down if output checking is truly
unnecessary, but I doubt that would cause a significant regression in
practice.

Fixes bazelbuild#20843.
tjgq added a commit to tjgq/bazel that referenced this issue Jan 11, 2024
When building without the bytes, outputs are only materialized when either
(1) an action prefetches them, or (2) they've been explicitly requested. When
both --remote_download_minimal and --noexperimental_check_output_files are
set, this presents a problem for a build command followed by a run command
for the same target: the build command won't download the outputs and the run
command won't check for their presence, proceeding without them. A
non-incremental run command works, because the outputs are considered
top-level even in minimal mode.

This is only a problem for a run command because it exists outside of action
execution, and doesn't get a chance to prefetch inputs; it would have been
better to implement the run command by injecting a specially crafted action
into the build graph, but that will have to wait for another day. The present
fix might theoretically slow things down if output checking is truly
unnecessary, but I doubt that would cause a significant regression in
practice.

Fixes bazelbuild#20843.
tjgq added a commit to tjgq/bazel that referenced this issue Jan 11, 2024
When building without the bytes, outputs are only materialized when either
(1) an action prefetches them, or (2) they've been explicitly requested. When
both --remote_download_minimal and --noexperimental_check_output_files are
set, this presents a problem for a build command followed by a run command
for the same target: the build command won't download the outputs and the run
command won't check for their presence, proceeding without them. A
non-incremental run command works, because the outputs are considered
top-level even in minimal mode.

This is only a problem for a run command because it exists outside of action
execution, and doesn't get a chance to prefetch inputs; it would have been
better to implement the run command by injecting a specially crafted action
into the build graph, but that will have to wait for another day. The present
fix might theoretically slow things down if output checking is truly
unnecessary, but I doubt that would cause a significant regression in
practice.

Fixes bazelbuild#20843.
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 12, 2024
When building without the bytes, outputs are only materialized when either (1) an action prefetches them, or (2) they've been explicitly requested. When both --remote_download_minimal and --noexperimental_check_output_files are set, this presents a problem for a build command followed by a run command for the same target: the build command won't download the outputs and the run command won't check for their presence, proceeding without them. A non-incremental run command works, because the outputs are considered top-level even in minimal mode.

This is only a problem for a run command because it exists outside of action execution, and doesn't get a chance to prefetch inputs; it would have been better to implement the run command by injecting a specially crafted action into the build graph, but that will have to wait for another day. The present fix might theoretically slow things down if output checking is truly unnecessary, but I doubt that would cause a significant regression in practice.

Fixes bazelbuild#20843.

Closes bazelbuild#20853.

PiperOrigin-RevId: 597909909
Change-Id: I66aedef4994fbda41fe5378c80940fa8ba637bfd
meteorcloudy pushed a commit that referenced this issue Jan 15, 2024
…e bytes. (#20881)

When building without the bytes, outputs are only materialized when
either (1) an action prefetches them, or (2) they've been explicitly
requested. When both --remote_download_minimal and
--noexperimental_check_output_files are set, this presents a problem for
a build command followed by a run command for the same target: the build
command won't download the outputs and the run command won't check for
their presence, proceeding without them. A non-incremental run command
works, because the outputs are considered top-level even in minimal
mode.

This is only a problem for a run command because it exists outside of
action execution, and doesn't get a chance to prefetch inputs; it would
have been better to implement the run command by injecting a specially
crafted action into the build graph, but that will have to wait for
another day. The present fix might theoretically slow things down if
output checking is truly unnecessary, but I doubt that would cause a
significant regression in practice.

Fixes #20843.

Closes #20853.

Commit
2f899ef

PiperOrigin-RevId: 597909909
Change-Id: I66aedef4994fbda41fe5378c80940fa8ba637bfd

Co-authored-by: Tiago Quelhas <tjgq@google.com>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.0.1 RC2. Please test out the release candidate and report any issues as soon as possible. Thanks!

bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 22, 2024
When building without the bytes, outputs are only materialized when either (1) an action prefetches them, or (2) they've been explicitly requested. When both --remote_download_minimal and --noexperimental_check_output_files are set, this presents a problem for a build command followed by a run command for the same target: the build command won't download the outputs and the run command won't check for their presence, proceeding without them. A non-incremental run command works, because the outputs are considered top-level even in minimal mode.

This is only a problem for a run command because it exists outside of action execution, and doesn't get a chance to prefetch inputs; it would have been better to implement the run command by injecting a specially crafted action into the build graph, but that will have to wait for another day. The present fix might theoretically slow things down if output checking is truly unnecessary, but I doubt that would cause a significant regression in practice.

Fixes bazelbuild#20843.

Closes bazelbuild#20853.

PiperOrigin-RevId: 597909909
Change-Id: I66aedef4994fbda41fe5378c80940fa8ba637bfd
github-merge-queue bot pushed a commit that referenced this issue Jan 27, 2024
…e bytes. (#20988)

When building without the bytes, outputs are only materialized when
either (1) an action prefetches them, or (2) they've been explicitly
requested. When both --remote_download_minimal and
--noexperimental_check_output_files are set, this presents a problem for
a build command followed by a run command for the same target: the build
command won't download the outputs and the run command won't check for
their presence, proceeding without them. A non-incremental run command
works, because the outputs are considered top-level even in minimal
mode.

This is only a problem for a run command because it exists outside of
action execution, and doesn't get a chance to prefetch inputs; it would
have been better to implement the run command by injecting a specially
crafted action into the build graph, but that will have to wait for
another day. The present fix might theoretically slow things down if
output checking is truly unnecessary, but I doubt that would cause a
significant regression in practice.

Fixes #20843.

Closes #20853.

Commit
2f899ef

PiperOrigin-RevId: 597909909
Change-Id: I66aedef4994fbda41fe5378c80940fa8ba637bfd

Co-authored-by: Tiago Quelhas <tjgq@google.com>
@tjgq tjgq added untriaged and removed P1 I'll work on this now. (Assignee required) labels Nov 5, 2024
@tjgq tjgq reopened this Nov 5, 2024
@tjgq
Copy link
Contributor Author

tjgq commented Nov 5, 2024

Reopening because, as recently discussed in #20853, the solution implemented in that PR is detrimental to performance and can be improved upon.

@tjgq tjgq changed the title Incremental bazel run is incompatible with BwoB + --noexperimental_check_output_files Incremental bazel run + BwoB + --noexperimental_check_output_files = slow Nov 5, 2024
tjgq added a commit to tjgq/bazel that referenced this issue Nov 6, 2024
…mmands.

This reverts the functional changes introduced by bazelbuild@2f899ef. They're no longer required because bazelbuild@4c42edb also happened to fix incremental fetching of top-level outputs for `run` commands. The integration test added by the former CL remains in place.

Fixes bazelbuild#20843.

PiperOrigin-RevId: 693664834
Change-Id: I64c1b0d14ae54f3906b9f7d916dbbb4f0204a1b9
github-merge-queue bot pushed a commit that referenced this issue Nov 6, 2024
…mmands. (#24224)

This reverts the functional changes introduced by
2f899ef.
They're no longer required because
4c42edb
also happened to fix incremental fetching of top-level outputs for `run`
commands. The integration test added by the former CL remains in place.

Fixes #20843.

PiperOrigin-RevId: 693664834
Change-Id: I64c1b0d14ae54f3906b9f7d916dbbb4f0204a1b9
ramil-bitrise pushed a commit to bitrise-io/bazel that referenced this issue Dec 18, 2024
This reverts the functional changes introduced by bazelbuild@2f899ef. They're no longer required because bazelbuild@4c42edb also happened to fix incremental fetching of top-level outputs for `run` commands. The integration test added by the former CL remains in place.

Fixes bazelbuild#20843.

PiperOrigin-RevId: 693664834
Change-Id: I64c1b0d14ae54f3906b9f7d916dbbb4f0204a1b9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants