Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink spawn strategies #19904

Open
2 of 12 tasks
tjgq opened this issue Oct 20, 2023 · 3 comments
Open
2 of 12 tasks

Rethink spawn strategies #19904

tjgq opened this issue Oct 20, 2023 · 3 comments
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@tjgq
Copy link
Contributor

tjgq commented Oct 20, 2023

We've seen a lot of reported issues that, one way or another, reflect a lack of flexibility in the way we apply spawn strategies.

We believe it's time to rethink how spawn strategies are implemented and what the right API to choose among them should look like. In particular, it would be nice to:

  • Reimplement the disk/remote cache lookup as a strategy (instead of as part of the spawn runner)
  • Generalize dynamic execution to an arbitrary number of strategies (or combinations of strategies) to be tried in parallel
  • Generalize local fallback to an arbitrary number of strategies (or combinations of strategies) to be tried in series
  • Have a way to distinguish "permanent" from "transient" failures (permanent are not worth retrying with a different strategy, transient might be)
  • Make it possible to toggle strategies along dimensions other than mnemonic (e.g. based on the execution platform, selected toolchain, or estimated resource consumption)

This would make it possible to provide solutions to the following issues:

In particular, #11432 references https://docs.google.com/document/d/1U9HzdDmtRnm244CaRM6JV-q2408mbNODAMewcGjnnbM/edit#heading=h.5mcn15i0e1ch which contains many interesting ideas we should revisit in light of this set of requirements.

@tjgq
Copy link
Contributor Author

tjgq commented Oct 20, 2023

cc @coeuvre

@tjgq tjgq added team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request labels Oct 20, 2023
@pcjanzen
Copy link
Contributor

It would be great to also address the issues described in #11432

@tjgq
Copy link
Contributor Author

tjgq commented Oct 20, 2023

Thanks, it's very much on topic and I've added it to the list.

copybara-service bot pushed a commit to bazelbuild/intellij that referenced this issue Nov 14, 2023
… reflects its purpose.

Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache).

Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see bazelbuild/bazel#19904).

Progress on bazelbuild/bazel#18643.

PiperOrigin-RevId: 582294420
copybara-service bot pushed a commit that referenced this issue Nov 14, 2023
… reflects its purpose.

Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache).

Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see #19904).

Progress on #18643.

PiperOrigin-RevId: 582294420
Change-Id: I75fc9a42b8a38c2233cc033e97d7c87264fb11d7
tjgq added a commit to tjgq/bazel that referenced this issue Nov 14, 2023
…curately reflects its purpose.

Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache).

Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see bazelbuild#19904).

Progress on bazelbuild#18643.

PiperOrigin-RevId: 582294420
Change-Id: I75fc9a42b8a38c2233cc033e97d7c87264fb11d7
keertk pushed a commit that referenced this issue Nov 14, 2023
…curately reflects its purpose. (#20201)

Also amend misleading or plainly incorrect comments. In particular, the
runner field is always populated, even for cache hits, and spawns are
only missing when the persistent action cache is hit ("local" is
ambiguous, as we also use it in certain places to refer to the disk
cache).

Note that, while the runner and cache_hit fields currently overlap in
purpose, their separate existence is better aligned with potential
future plans to promote cache lookups to a strategy (see
#19904).

Progress on #18643.

PiperOrigin-RevId: 582294420
Change-Id: I75fc9a42b8a38c2233cc033e97d7c87264fb11d7
mai93 pushed a commit to bazelbuild/intellij that referenced this issue Nov 21, 2023
… reflects its purpose.

Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache).

Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see bazelbuild/bazel#19904).

Progress on bazelbuild/bazel#18643.

(cherry picked from commit 0bc8e5f)
Silic0nS0ldier added a commit to Silic0nS0ldier/bazel-proposals that referenced this issue Dec 7, 2023
copybara-service bot pushed a commit that referenced this issue Jan 26, 2024
Previously, they were both displayed as `remote-cache`; there's now a separate `disk-cache` form. If a combined cache is used, one or both forms will appear, depending on which caches were looked up.

As a result, the progress status reporting is moved to the individual cache implementations. While this is kind of unfortunate from an architectural standpoint, it's likely the best we can do until we recast cache lookups as spawn strategies (see #19904).

Closes #20935.

PiperOrigin-RevId: 601748051
Change-Id: I03710219973c95d4fca999d931b3513f6d240d94
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 26, 2024
Previously, they were both displayed as `remote-cache`; there's now a separate `disk-cache` form. If a combined cache is used, one or both forms will appear, depending on which caches were looked up.

As a result, the progress status reporting is moved to the individual cache implementations. While this is kind of unfortunate from an architectural standpoint, it's likely the best we can do until we recast cache lookups as spawn strategies (see bazelbuild#19904).

Closes bazelbuild#20935.

PiperOrigin-RevId: 601748051
Change-Id: I03710219973c95d4fca999d931b3513f6d240d94
tjgq added a commit to tjgq/bazel that referenced this issue Jan 26, 2024
Previously, they were both displayed as `remote-cache`; there's now a separate `disk-cache` form. If a combined cache is used, one or both forms will appear, depending on which caches were looked up.

As a result, the progress status reporting is moved to the individual cache implementations. While this is kind of unfortunate from an architectural standpoint, it's likely the best we can do until we recast cache lookups as spawn strategies (see bazelbuild#19904).

Closes bazelbuild#20935.

PiperOrigin-RevId: 601748051
Change-Id: I03710219973c95d4fca999d931b3513f6d240d94
tjgq added a commit to tjgq/bazel that referenced this issue Jan 26, 2024
… status.

Previously, they were both displayed as `remote-cache`; there's now a separate `disk-cache` form. If a combined cache is used, one or both forms will appear, depending on which caches were looked up.

As a result, the progress status reporting is moved to the individual cache implementations. While this is kind of unfortunate from an architectural standpoint, it's likely the best we can do until we recast cache lookups as spawn strategies (see bazelbuild#19904).

Closes bazelbuild#20935.

PiperOrigin-RevId: 601748051
Change-Id: I03710219973c95d4fca999d931b3513f6d240d94
github-merge-queue bot pushed a commit that referenced this issue Jan 26, 2024
… status. (#21084)

Previously, they were both displayed as `remote-cache`; there's now a
separate `disk-cache` form. If a combined cache is used, one or both
forms will appear, depending on which caches were looked up.

As a result, the progress status reporting is moved to the individual
cache implementations. While this is kind of unfortunate from an
architectural standpoint, it's likely the best we can do until we recast
cache lookups as spawn strategies (see #19904).

Closes #20935.

PiperOrigin-RevId: 601748051
Change-Id: I03710219973c95d4fca999d931b3513f6d240d94
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

2 participants