Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--incompatible_strict_action_env breaks autodetecting Python toolchain on Mac #8536

Closed
brandjon opened this issue May 31, 2019 · 20 comments
Closed
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Python Native rules for Python type: bug

Comments

@brandjon
Copy link
Member

Enabling --incompatible_strict_action_env without providing an explicit --action_env causes PATH to default to /bin:/usr/bin.

if (strictActionEnv) {
env.put("PATH", pathOrDefault(os, null, shellExecutable));
} else if (os == OS.WINDOWS) {

// TODO(ulfjack): The default PATH should be set from the exec platform, which may be different
// from the local machine. For now, this can be overridden with --action_env=PATH=<value>, so
// at least there's a workaround.
if (os != OS.WINDOWS) {
return "/bin:/usr/bin";
}

On our Mac CI workers, python3 is installed via Homebrew in /usr/local/bin. I expect other Mac users would have similar configurations.

This causes downstream failures in my presubmit to flip --incompatible_use_python_toolchains for Bazel 0.27. The flag is being flipped anyway. I expect those projects will have to either modify their --action_env or somehow modify their buildkite CI to add a custom Python toolchain.

+cc @philwo for CI, @buchgr for strict action env.

@brandjon brandjon added team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged labels May 31, 2019
@brandjon
Copy link
Member Author

Gerrit breakage tracked here. Note that their bazelrc contains

build --experimental_strict_action_env
build --action_env=PATH

I'm not sure why /usr/local/bin wouldn't be propagated through PATH. It should be in the default environment of the CI workers.

@brandjon
Copy link
Member Author

brandjon commented May 31, 2019

rules_nodejs breakage tracked in bazel-contrib/rules_nodejs#809.

@brandjon
Copy link
Member Author

brandjon commented Jun 5, 2019

Gerrit sets --incompatible_strict_action_env but also passes in --action_env=PATH, so you'd think they'd just get the client's PATH. Yet the error message indicates that the action is running with the default strict action env PATH of /bin:/usr/bin.

@buchgr, any idea why that would be?

@davido
Copy link
Contributor

davido commented Jun 5, 2019

IIRC Gerrit sets --incompatible_strict_action_env to make disk cache reliably work across different checkout directories (~/projects/gerrit, ~/projects/gerrit2, ...).

Related:

In a nutshell, after I have added action cache feature to Bazel in this CL, one of the most important features is to be able to build a project in different directory clones, re-using action cache (now called disk cache). That was not working. As pointed out by @ulfjack in this comment, adding --incompatible_strict_action_env to .bazelrc fixed that.

@buchgr
Copy link
Contributor

buchgr commented Jun 5, 2019

@brandjon Would it be possible for python to detect the PATH to the interpret as a repository_rule?

@brandjon
Copy link
Member Author

brandjon commented Jun 5, 2019

Yes, but we originally decided that wasn't necessary. Such a toolchain would only work for the host machine of course (notwithstanding running the repo rule in a docker image, etc.). Had I realized how much the action env could vary I might've chosen differently.

I don't think we need to add a new toolchain at this time though. Users can migrate by either fiddling with the action_env (meh) or better yet defining an appropriate toolchain for the platform they're running on. In the case of our CI, maybe we need to adjust our buildkite runner to register an explicit Python toolchain.

@brandjon
Copy link
Member Author

FTR, the rules_nodejs issue is fixed by adding --action_env=PATH to the bazelrc.

I think that action_env is only read by py_test, not py_binary, so this may not work for gerrit. Still investigating.

@brandjon
Copy link
Member Author

See discussion in #8425. It seems that PATH is being treated correctly by --incompatible_strict_action_env and --action_env. So the answer is that if you're relying on the autodetecting toolchain, make sure the Python interpreter (both versions) are in your PATH, by using --action_env if necessary.

@brandjon brandjon added P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python and removed team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged labels Jun 10, 2019
@brandjon
Copy link
Member Author

In debugging the gerrit failure, I see that they're using a genrule-within-a-genrule. What's happening is that the outer genrule transitions to the host config for its tools, so the inner genrule itself (not just its tools) gets built in the host config. But the host config does not propagate --action_env, leading to the breakage.

To fix this, we need to do one of the following:

  1. Propagate --action_env to the host config. This might be appropriate, seeing as how --incompatible_strict_action_env affects the host config.

  2. Update the hardcoded default PATH for the strict environment, from /bin:/usr/bin to /bin:/usr/bin:/usr/local/bin. This might be less risky then changing --action_env's behavior in general.

  3. Do neither, but implement Add an autodetecting host toolchain #8603 and in the meantime require users to define an explicit toolchain. This is the least attractive option.

@brandjon
Copy link
Member Author

I'd like to do 1 or 2 and cherrypick into the release, so bumping priority.

@brandjon brandjon added P1 I'll work on this now. (Assignee required) release blocker and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Jun 11, 2019
@brandjon brandjon self-assigned this Jun 11, 2019
@philwo
Copy link
Member

philwo commented Jun 11, 2019

Thank you for debugging! We should definitely do 2 (it’s just wrong to not include /usr/local/bin) and probably also do 1, but that can be done afterwards.

@laurentlb
Copy link
Contributor

From #7816 (comment), you say it happens only when incompatible_strict_action_env is set. We normally provide guarantees for incompatible flags (whose default value is false). So I suggest no cherrypick, no delaying of release, but we can make sure the fix is included in 0.28.

(I would vote for releasing 0.28 quickly, but no everyone seems to agree)

@brandjon
Copy link
Member Author

To some extent people have been using --incompatible_strict_action_env as a feature flag, independent of its incompatible / experimental status. In particular, projects often use it for caching advantages, e.g. when PATH contains a different temporary directory on each invocation.

The flag had previously been flipped on and rolled back. I don't know if anyone is actively working on trying to flip it again, and whether we have made a decision regarding supporting it. +@aehlig

@philwo
Copy link
Member

philwo commented Jun 11, 2019

There was a long discussion on bazel-discuss about this and it was agreed upon that the whole idea of having a "strict PATH" that's valid for "any Linux installation" or "any Windows installation" is flawed and does not work. Thus, the flag and mechanism should go away, instead Bazel should simply use the PATH set by the user's shell.

Users would then be able to override this either by using a Bazel wrapper that always runs Bazel with a fixed PATH, or by setting --action_env= to a known value. I know many larger users of Bazel who run Bazel using a wrapper that normalizes the environment on all of their developer machines to known values so that cache sharing works.

@buchgr also has some context and had some advanced ideas how to make it possible to achieve better cache hits without requiring a "strict PATH", AFAIR.

@brandjon
Copy link
Member Author

Sounds like someone should own a bug for removing the flag then. Would there be an easy way to opt into an environment variable being unavailable by default, instead of inherited by default?

@philwo
Copy link
Member

philwo commented Jun 11, 2019

I don’t understand - what do you mean with opt-in to an environment variable being unavailable by default? And what’s the relation to the above ideas how to fix this?

@brandjon
Copy link
Member Author

Sorry, I was digressing away from this problem. I was wondering, if we're getting rid of the strict environment flag, but we're supporting --action_env to set specific env vars, what about arbitrary env vars that the user wants to filter out but doesn't know about to enumerate them in --action_env? Or is the idea to let a wrapper script do that part?

Anyway, for this issue I'm working on a quick fix that implements 2.

laurentlb pushed a commit that referenced this issue Jun 12, 2019
With this change, when --incompatible_strict_action_env is on but no override for PATH is available, the default PATH will include /usr/local/bin. This allows Python 3 to be on PATH on mac, which is important for the autodetecting Python toolchain.

Without this change, users can still use --action_env=PATH[=...], but that doesn't work in all cases (e.g. genrules that are themselves built in the host config).

Fixes #8536. Confirmed that this fixes the failure seen in https://bugs.chromium.org/p/gerrit/issues/detail?id=10953.

RELNOTES: When `--incompatible_strict_action_env` is enabled, the default `PATH` now includes `/usr/local/bin`.
PiperOrigin-RevId: 252696581
@davido
Copy link
Contributor

davido commented Jun 14, 2019

@brandjon I am trying to build gerrit@HEAD (7afb83f73b031b8e03cb905bc97b1a9ab79ddf0c) on Mac OS now, with Mojave 10.14.5 and seeing this breakage:

$  bazel version
Build label: 0.27.0rc5

$ bazel build :release
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 815f6552-32b0-4237-a399-f32d8c2593ef
INFO: Analyzed target //:release (417 packages loaded, 7482 targets configured).
INFO: Found 1 target...
INFO: From Building proto/libentities_proto-speed.jar (1 source jar):
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From Building proto/libcache_proto-speed.jar (1 source jar):
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From Building external/io_bazel_rules_closure/java/io/bazel/rules/closure/worker/libworker_protocol_proto-speed.jar (1 source jar) [for host]:
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From Building external/io_bazel_rules_closure/java/io/bazel/rules/closure/libbuild_info_proto-speed.jar (1 source jar) [for host]:
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From Building external/io_bazel_rules_closure/java/io/bazel/rules/closure/webfiles/libbuild_info_proto-speed.jar (1 source jar) [for host]:
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From Building external/com_google_protobuf/libprotobuf_java.jar (122 source files, 1 source jar) [for host]:
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
ERROR: /Users/davido/projects/gerrit/lib/lucene/BUILD:7:1: Executing genrule //lib/lucene:lucene-core-and-backward-codecs__merged_bin failed (Exit 1) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
Error occurred while attempting to use the default Python toolchain (@bazel_tools//tools/python:autodetecting_toolchain).
According to '/usr/bin/python -V', version is 'Python 2.7.10', but we need version 3. PATH is:

/bin:/usr/bin:/usr/local/bin

Please ensure an interpreter with version 3 is available on this platform as 'python3' or 'python', or else register an appropriate Python toolchain as per the documentation for py_runtime_pair (https://github.com/bazelbuild/bazel/blob/master/tools/python/toolchain.bzl).

Note that prior to Bazel 0.27, there was no check to ensure that the interpreter's version matched the version declared by the target (#4815). If your build worked prior to Bazel 0.27, and you're sure your targets do not require Python 3, you can opt out of this version check by using the non-strict autodetecting toolchain instead of the standard autodetecting toolchain. This can be done by passing the flag `--extra_toolchains=@bazel_tools//tools/python:autodetecting_toolchain_nonstrict` on the command line or adding it to your bazelrc.
----------------
Note: The failure of target //tools:merge_jars (with exit code 1) may have been caused by the fact that it is running under Python 3 instead of Python 2. Examine the error to determine if that appears to be the problem. Since this target is built in the host configuration, the only way to change its version is to set --host_force_python=PY2, which affects the entire build.

If this error started occurring in Bazel 0.27 and later, it may be because the Python toolchain now enforces that targets analyzed as PY2 and PY3 run under a Python 2 and Python 3 interpreter, respectively. See https://github.com/bazelbuild/bazel/issues/7899 for more information.
----------------
Target //:release failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 353.288s, Critical Path: 104.58s
INFO: 1204 processes: 1 remote cache hit, 411 darwin-sandbox, 6 local, 786 worker.
FAILED: Build did NOT complete successfully

Update I tried to update Python on this machine, and the outcome was:

$ brew install python
Error: Your Xcode (10.1) is too outdated.
Please update to Xcode 10.2.1 (or delete it).
Xcode can be updated from the App Store.

I am updating XCode version right now and will report the result later.

Update2 XCode installed. Now, trying to install python with brew, I am getting this:

 $ brew install python
Error: python 2.7.14 is already installed
To upgrade to 3.7.3, run `brew upgrade python`.

Now, after running python upgrade:

  $ brew upgrade python
[...]
==> python
Python has been installed as
  /usr/local/bin/python3

Gerrit can be built:

  $ bazel build :release
[...]
  INFO: From Building external/com_google_protobuf/libprotobuf_java.jar (122 source files, 1 source jar):
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
Target //:release up-to-date:
  bazel-bin/release.war
INFO: Elapsed time: 127.333s, Critical Path: 101.67s
INFO: 124 processes: 80 darwin-sandbox, 44 worker.
INFO: Build completed successfully, 128 total actions
  

@brandjon
Copy link
Member Author

Yep, the error was because your machine didn't have Python 3 installed, and prior to 0.27 it worked because the default behavior effectively was to use Python 2 in spite of the way the gerrit codebase (and many others) are set up.

Your options were to

  1. install python 3, which you did
  2. pass --extra_toolchains=@bazel_tools//tools/python:autodetecting_toolchain_nonstrict, telling Bazel to ignore PY2 vs PY3 when selecting an interpreter at execution time (not ideal)
  3. pass --host_force_python=PY2, telling Bazel to continue to use Python 2 for targets like that genrule. This by itself doesn't help if you have non-host-configured targets that are declared to use Python 3 (in their python_version attr, or by default if that attr is omitted).

qtprojectorg pushed a commit to qtqa/gerrit that referenced this issue Jun 15, 2019
This is needed to fix Bazel CI build breakage on Mac OS. See these
issues for more details: [1], [2].

[1] bazelbuild/bazel#8536
[2] bazelbuild/continuous-integration#724

Change-Id: Ib879ccd375b19dc2e8d4d4e6e567629b7d35636d
irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
With this change, when --incompatible_strict_action_env is on but no override for PATH is available, the default PATH will include /usr/local/bin. This allows Python 3 to be on PATH on mac, which is important for the autodetecting Python toolchain.

Without this change, users can still use --action_env=PATH[=...], but that doesn't work in all cases (e.g. genrules that are themselves built in the host config).

Fixes bazelbuild#8536. Confirmed that this fixes the failure seen in https://bugs.chromium.org/p/gerrit/issues/detail?id=10953.

RELNOTES: When `--incompatible_strict_action_env` is enabled, the default `PATH` now includes `/usr/local/bin`.
PiperOrigin-RevId: 252696581
siberex pushed a commit to siberex/bazel that referenced this issue Jul 4, 2019
With this change, when --incompatible_strict_action_env is on but no override for PATH is available, the default PATH will include /usr/local/bin. This allows Python 3 to be on PATH on mac, which is important for the autodetecting Python toolchain.

Without this change, users can still use --action_env=PATH[=...], but that doesn't work in all cases (e.g. genrules that are themselves built in the host config).

Fixes bazelbuild#8536. Confirmed that this fixes the failure seen in https://bugs.chromium.org/p/gerrit/issues/detail?id=10953.

RELNOTES: When `--incompatible_strict_action_env` is enabled, the default `PATH` now includes `/usr/local/bin`.
PiperOrigin-RevId: 252696581
irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
With this change, when --incompatible_strict_action_env is on but no override for PATH is available, the default PATH will include /usr/local/bin. This allows Python 3 to be on PATH on mac, which is important for the autodetecting Python toolchain.

Without this change, users can still use --action_env=PATH[=...], but that doesn't work in all cases (e.g. genrules that are themselves built in the host config).

Fixes bazelbuild#8536. Confirmed that this fixes the failure seen in https://bugs.chromium.org/p/gerrit/issues/detail?id=10953.

RELNOTES: When `--incompatible_strict_action_env` is enabled, the default `PATH` now includes `/usr/local/bin`.
PiperOrigin-RevId: 252696581
@keith
Copy link
Member

keith commented Sep 18, 2019

This causes on issue on macOS because it ships with /usr/bin/python which is python2.7, but no /usr/bin/python2, but homebrew's python@2 has a /usr/local/bin/python2 which ends up being preferred by the setup script. IMO this is pretty unexpected if you're passing --experimental_strict_action_env because (at least in our case) the reason we're doing this is to make sure users' homebrew installations don't affect their builds (so we can make sure everything is hermetic and we get a good cache hit rate).

I submitted a non-ideal fix for this #9402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Python Native rules for Python type: bug
Projects
None yet
Development

No branches or pull requests

6 participants