Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration of Automatic Exec Groups [Exec platform per toolchain] in Bazel downstream #19981

Open
kotlaja opened this issue Oct 30, 2023 · 8 comments
Assignees
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request

Comments

@kotlaja
Copy link
Contributor

kotlaja commented Oct 30, 2023

Goals of this document

  1. A short description of this migration, without going into the details. In case you would like to see more, check: Exec platform per toolchain. Additional information will also be available soon on official Bazel documentation.
  2. Important note to reviewers: Please double check if the correct toolchain type is added for each action. I’ve done the best on my part but additional precaution is not harmful. 🙂

Goal of this migration

Each toolchain type should have its own execution platform. Before this change, the execution platform was selected on a rule level, not on a toolchain type level.

Each action which uses a tool or executable coming from a toolchain should specify the toolchain type inside ctx.action.{run, run_shell} toolchain attribute.

What is needed for this migration

In order to use Automatic Exec Groups, two things are required:
Each ctx.actions.{run, run_shell} which uses a toolchain as a tool/executable needs to add its toolchain type inside the toolchain parameter. In this way, actions are sent to the right platform.
Moreover, rule which creates that action needs to define that toolchain_type inside its own rule definition. In other words, the toolchain must be registered on the rule.

For example:

# Registering toolchain on the rule
custom_rule = rule(
        implementation = _impl,
        toolchains = ['@bazel_tools//tools/jdk:toolchain_type']
)
# Using toolchain inside the action
ctx.actions.run(
        toolchain = '@bazel_tools///tools/jdk:toolchain_type'
        executable = ctx.toolchains['@bazel_tools///tools/jdk:toolchain_type'].tool
        …
)

Why does a rule creator need to take care of a toolchain type?

If an action’s executable is not coming from a toolchain, toolchain parameter doesn’t need to be set or it could be set to None.
If an action’s executable is coming from a toolchain, toolchain parameter should be set. The reason is simple: A correct execution platform will be selected for the action since it will be mapped with this toolchain type.

Why must toolchain_type be added manually and not reused from a tool coming from the toolchain?

The tool is a file and we don’t know which toolchain_type it refers to.
Or, the tool is a provider, inside which we could add a toolchain_type but that would increase overall memory.

Which Bazel versions support this migration?

AEGs are fully supported from Bazel 7.
toolchain parameter is supported from Bazel 6.

If your project supports lower Bazel versions, you need to disable AEGs.

How to disable AEGs in case of an error?

Set the incompatible_auto_exec_groups flag to false or disable a particular rule with the _use_auto_exec_groups
attribute.

Potential errors you could step on while migrating to AEGs

Couldn't identify if tools are from implicit dependencies or a toolchain. Please set the toolchain parameter. If you're not using a toolchain, set it to 'None'.

  • In this case you will be given a stack of calls before the error happened and you will be able to clearly see which exact action needs the toolchain parameter. Check which toolchain is used for the action and set it with the toolchain param. If no toolchain is used inside the action for tools or executable, set it to None.

Action declared for non-existent toolchain '//rule:toolchain_type_1'.

  • This means that you've set the toolchain parameter on the action but didn't register it on the rule. Register the toolchain or set None inside the action.
@anguslees
Copy link

anguslees commented Jan 17, 2024

I've read the above, and the referenced doc, but the answer didn't jump out at me. Apologies if I misunderstood the proposal.

Question: What should we do for actions that need more than one toolchain?
Eg golang actions with cgo need both go and cpp toolchains resolved for the same platform in a single compile action.

Iiuc, currently we would list both toolchains on the rule(toolchains) attribute, toolchain resolution would take them all into account when choosing an exec platform, and then (aiui) the same exec platform is used for all actions in that rule.

@anguslees
Copy link

Duh, found the answer right after posting 😮‍💨. The new bazel docs explain the answer clearly:

If you need to use multiple toolchains on a single execution platform (an action uses executable or tools from two or more toolchains), you need to manually define exec_groups (check When should I use a custom exec_group? section).

@kotlaja: Iiuc, I think your rules_go PR above does not follow this advice - and should.

@fmeum
Copy link
Collaborator

fmeum commented Oct 9, 2024

@kotlaja @comius I tried to enable --incompatible_auto_exec_groups in rules_go, but found it to be incompatible with --incompatible_enable_proto_toolchain_resolution.

The root cause is that https://github.com/protocolbuffers/protobuf/blob/6690ab42d855ea19d9a24cd99b0375910ea772ca/bazel/private/toolchain_helpers.bzl#L48 and https://github.com/bazelbuild/rules_proto/blob/db09d58959ed951db34c2e056cef084954e55640/proto/private/rules/proto_toolchain_rule.bzl#L23 (depending on where the proto rules are consumed from) are strings, not Labels. They thus get resolved relative to the current repo when passed to the toolchains parameter of ctx.actions.run in proto_common.compile, which results in errors such as:

ERROR: /Users/fmeum/git/rules_go/tests/legacy/examples/proto/embed/BUILD.bazel:5:14: in proto_library rule //tests/legacy/examples/proto/embed:embed_proto:
Traceback (most recent call last):
	File "/virtual_builtins_bzl/common/proto/proto_library.bzl", line 88, column 26, in _proto_library_impl
	File "/virtual_builtins_bzl/common/proto/proto_library.bzl", line 226, column 25, in _write_descriptor_set
	File "/virtual_builtins_bzl/common/proto/proto_common.bzl", line 231, column 16, in _compile
Error in run: Action declared for non-existent toolchain '//proto:toolchain_type'.

We would need to get new releases of rules_proto (say 6.0.3) and protobuf out to unblock CI.

@purkhusid
Copy link

Was going to update rules_dotnet to support --incompatible_auto_exec_groups but was blocked due to rules_go not supporting it yet(dev dependency). Will follow this and do the migration once rules_proto/rules_go is migrated.

@aignas
Copy link

aignas commented Oct 15, 2024

FYI, rules_python is failing due to errors related to @protobuf:

(13:51:13) ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/ec321eb2cc2d0f8f91b676b6d4c66c29/external/protobuf~/python/BUILD.bazel:53:20: in internal_copy_files_impl rule @@protobuf~//python:copied_wkt_proto_files:

Let us know when they are ready

@kotlaja
Copy link
Contributor Author

kotlaja commented Oct 15, 2024

Thanks folks for raising these issues. Let's then wait for rules_proto and protobuf before flipping this flag since otherwise we'll be blocked with this migration in quite a few rules.

@meteorcloudy
Copy link
Member

FYI @kotlaja I ran a compatibility test with the top 100 BCR modules for --incompatible_auto_exec_groups with Bazel 7.4.1, here is the result:

https://buildkite.com/bazel/bcr-bazel-compatibility-test/builds/45

@kotlaja
Copy link
Contributor Author

kotlaja commented Nov 14, 2024

@meteorcloudy thanks for gathering that information!

I talked to @fmeum and it seems that protocolbuffers/protobuf@e358a40 fixed the issue Fabian noted in this thread. Still, multiple projects are blocked by rules_python, and rules_python is blocked on protobuf.

Hence, I opened an issue for protobuf and tried migrating on my own (protocolbuffers/protobuf#19262) in order to unblock other projects. I fixed a few actions and then I came to an obstacle with rules_kotlin, so I'm waiting for protobuf team to let me know their thoughts (if they'd like to try upgrading rules_kotlin since some troubling files changed in the meantime)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request
Projects
None yet
Development

No branches or pull requests

8 participants