bzlmod + build_python_zip breaks runfiles resolution #17941

coquelicot · 2023-03-31T08:08:45Z

Description of the bug:

When building a py_binary with --build_python_zip and bzlmod enabled, the runfiles cannot be correctly resolved.
Things would work if I remove the --build_python_zip option or switch back to the old WORKSPACE style.
I suspect this has something to do with the way bzlmod handle repo mapping.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I have a very small example to reproduce the issue: https://github.com/coquelicot/broken-bzlmod-pyzip

Here's the log from my machine:

 » bazel run example
INFO: Build option --build_python_zip has changed, discarding analysis cache.
INFO: Analyzed target //:example (0 packages loaded, 539 targets configured).
INFO: Found 1 target...
Target //:example up-to-date:
  bazel-bin/example
INFO: Elapsed time: 0.280s, Critical Path: 0.01s
INFO: 4 processes: 4 internal.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/example
sample

 » bazel run --build_python_zip example
INFO: Build option --build_python_zip has changed, discarding analysis cache.
INFO: Analyzed target //:example (0 packages loaded, 539 targets configured).
INFO: Found 1 target...
Target //:example up-to-date:
  bazel-bin/example
  bazel-bin/example.zip
INFO: Elapsed time: 0.274s, Critical Path: 0.01s
INFO: 4 processes: 3 internal, 1 linux-sandbox.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/example
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_tuww8rqk/runfiles/_main/example.py", line 11, in <module>
    main()
  File "/tmp/Bazel.runfiles_tuww8rqk/runfiles/_main/example.py", line 6, in main
    with open(path, "r") as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/Bazel.runfiles_tuww8rqk/runfiles/pyzip/data.txt'

Which operating system are you running Bazel on?

linux

What is the output of `bazel info release`?

release 6.1.1

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Have you found anything relevant by searching the web?

The way runfiles are implemented seems to be different with bzlmod: #16124

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

fmeum · 2023-03-31T08:37:40Z

Looks like the Python rules do not add symlinks and root_symlinks to the ZIP file, which causes the _repo_mapping file to be missing. I will try to fix this.

fmeum · 2023-03-31T09:03:40Z

#17942 is the easy part, but this is unfortunately complicated on a deeper level:

The _repo_mapping file is currently injected directly in https://cs.opensource.google/bazel/bazel/+/6f12510e39153e316f35ab0a1ec376ece7f1b0a9:src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java;l=421, which makes it invisible to the runfiles Starlark API.

We could instead add it in a way similar to https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java;l=125, but that would construct a new Runfiles object for every target.

~~An alternative could be to add the artifact to every Runfiles object directly upon construction (as long as Bzlmod is enabled) and then only register the generating action in RunfilesSupport.~~ Edit: This doesn't work as those Runfiles objects wouldn't be mergeable anymore.

@Wyverald @lberki I could use your help here as I don't have a good grasp on the potential performance implications of these approaches.

fmeum · 2023-03-31T09:49:49Z

Looked into this some more. Even if RunfilesSupport adds the repo mapping artifact to root symlinks, the Python rules still wouldn't see it as it is added only after the rule implementation has run. Maybe we need to expose it on RuleContext?

Edit: Cleared up some confusion I had, which led me to this insight: Computing the repo mapping requires the runfiles, so it simply isn't known before the rule commits to its runfiles. Getting it in the context of the rule only seems to be necessary if the rule wants to build a runfiles structure manually for itself - if it did this for some other rule, that rule's repo mapping would be in its runfiles. So unless the Python rules can build the ZIP as part of a separate rule using a macro, maybe RepoMappingManifestAction would need to be exposed to Starlark so that users can build their own manifests for arbitrary runfiles objects?

lberki · 2023-04-04T08:58:49Z

Let me see if I understand this correctly: the current way Python rules build ZIP files is that they dissect a Runfiles object and put the path to artifact mapping on a command line. But repo mappings make it impossible to determine that mapping solely from the Runfiles object. Is that correct?

Shooting from the hip, with the current toolbox, I could imagine creating a "fake" Python binary which is then put on the inputs of an action and then that action does not execute it, but simply extracts its runfiles tree. Would that work?

fmeum · 2023-04-04T10:32:16Z

@lberki That's exactly correct. If forcing such rules to become macros that obtain the runfiles from an actual target is acceptable, then that is certainly the easiest solution.

@rickeylev Could the executable Python rules go through a separate target to generate the runfiles that are then packaged up into the executable ZIP file?

lberki · 2023-04-04T11:22:12Z

Is the limitation that makes a macro necessary that FilesToRunProvider can only be created once per rule and outside the control of Starlark code and that the only way one can currently add a runfiles tree to the inputs of an action is by using ctx.actions.run(tools=) ?

fmeum · 2023-04-04T11:26:07Z

Is the limitation that makes a macro necessary that FilesToRunProvider can only be created once per rule and outside the control of Starlark code and that the only way one can currently add a runfiles tree to the inputs of an action is by using ctx.actions.run(tools=) ?

That is part of the reason. But even if there were a way to turn Runfiles into a FilesToRunProvider from Starlark, the logic that adds the repository mapping manifest is part of RunfilesSupport and RunfilesSupplier and may thus not invoked correctly in that way. This may require additional changes.

rickeylev · 2023-04-05T21:50:14Z

Could the executable Python rules go through a separate target to generate the runfiles that are then packaged up into the executable ZIP file?

Maybe? That last mile logic to generate the executable for the zip-file case is a bit hairy.

Does every target have the repo mapping? Or just certain ones? I'm thinking just have an extra target generated along side with the necessary info, e.g.

def py_binary(name, ...
  repo_mapping("_" + name + "_repo_map")
  py_binary_rule(..., _INTERNAL_ONLY_REPO_MAP = "_" + name + "_repo_map")

def repo_mapping_impl(ctx):
  return [DefaultInfo()] # repo mapping automatically added, right?

def py_binary_impl(ctx):
  repo_map = ctx.attr._INTERNAL_ONLY_REPO_MAP.runfiles

Couple other thoughts:

PyBuiltins.java is where various Java-hooks can go. If the necessary info can be easily gotten via Java, that's the place to put such hooks.
The input_manifests arg might be of help? There's a (now unused) helper in PyBuiltins that converts a runfiles object into a RunfilesSupplier.

fmeum · 2023-04-15T06:55:37Z

@rickeylev With Bzlmod enabled, every target has the repo mapping as a root symlinks that is injected in getRunfilesInputs: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java;drc=0ce17480390cdced2df8d59249561613e81f446f;l=421
It may thus not be accessible by rule implementations directly.

The artifact itself is created in RunfilesSupport: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java;l=135;bpv=1;bpt=1

We may have to make it available from there.

rickeylev · 2023-04-15T17:14:14Z

Exposing the RepoMappingManifestAction is part of the problem, but most of what it does is computed beforehand and are inputs into that action. Manually doing that logic does look feasible, though.

Looking at how RunfilesSupport builds the repo mapping, the two pieces we need are ruleContext.getTransitivePackagesForRunfileRepoMappingManifest() and something to get Package.getPackageIdentifier().getRepository() (or an equivalent).

Unfortunately, these both use the Package object, which I don't think has a Starlark equivalent. Can an object that doesn't implement the Starlark classes be passed from Java to Starlark? If so, we can just pass them onto another helper in PyBuiltins to operate on. If not, we'll have to convert Package to something else (probably a string or struct), or move more logic to Java (the map building is pretty simple, so this looks quite feasible).

Making a Starlark implemented rule's RunfilesSupport available to it during its execution is somewhere near impossible, in particular for the Python rules, so that isn't really an option. Problem 1 is RunfilesSupport is created after the rule function completes. If you try to manually create a RunfilesSupport object during execution (PyBuiltins.new_empty_runfiles_with_middleman does such), then it creates the same actions multiple times, which triggers duplicate action conflict detection logic later. Problem 2 is, for the Python rules, the duplicate detection ends up triggering the empty runfiles supplier code multiple times during the analysis phase, which requires flattening the runfiles, and the net effect is a huge CPU and memory regression. If there is some sort of solution involving Starlark+RunfilesSupport, I don't know it :).

Er, hm. Is that repo mapping code flattening the runfiles during analysis phase? That doesn't sound good, well, a problem for another time.

If the action did more of the heavy lifting (computing the map), it would make this easier overall and probably also solve the runfiles-being-flattened-during-analysis issue. We could probably reduce it to a simple helper in PyBuiltins that looked something like:

public void createRepoMappingManifest(StarlarkRuleContext starlarkCtx, Runfiles runfiles) {
  starlarkCtx.registerAction(new RepoMappingAction(
    starlarkCtx.getRuleContext().getTransitivePackagesForRunfileRepoMappingManifest(),
    runfiles
  ));
}

fmeum · 2023-04-15T17:31:15Z

As far as I remember the repo mapping manifest action isn't really doing much work since actions must be serializable, but Packages are not. That's also why the NestedSet is indeed flattened during analysis time.

We only need a tiny fraction of the Package information, so maybe that could be solved with some kind of custom serialization logic for RepoMappingManifestAction instead? I don't know much about this area though.

rickeylev · 2023-05-11T05:08:08Z

I was playing around with this tonight and may have come up with a somewhat simple work around: declare the repo mapping file and add it to the zip, but rely on the subsequent RunfilesSupport code to actually create the file. I think this might also have a bonus: because the same file isn't created twice, duplicate detection doesn't have to run. We can make it a bit less hacky by factoring out a "get file name" function from Runfiles.createRepoMappingManifestAction. I'll poke this some more tomorrow to see how promising it is.

As a side note, this is the second time I've encountered Starlark code wanting to get at some of the things in RunfilesSupport (the other case was par files, which wanted RunfilesSupport for a similar reason), so maybe there is some missing functionality somewhere.

Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark. Work towards #17941 Closes #18349. PiperOrigin-RevId: 531165675 Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905

rickeylev · 2023-05-11T20:58:00Z

Alright, today's experiment was trying to use a separate target, which looks iffy and unappealing.

The repo mapping is only generated and added to the runfiles of executable/test targets. This is due to StarlarkRuleConfiguredTargetUtil#addSimpleProviders (see the RunfilesSupport.withExecutable() call, guarded by the "is executable or test" check
When it is added, its added to the middleman artifact. See RunfilesSupport.createRunfilesMiddleman, where the repoMappingManifest is added to the nested set fed into the middleman
The code in Runfiles.getRunfilesInputs adds repoMappingManifest to the runfiles manifest, but not the runfiles itself. (and I think this is the manifest that may not be present, depending on a flag?) (Ah, I seem to recall that the manifest and inputs work in tandem: the manifest tells what path to materialize a file under when the runfiles tree is created, and the inputs tell what inputs are actually available)

The net effect is, when you look at a target's runfiles, you don't see the repo mapping because it's in the middleman artifact, which I think is a secret dependency of the executable somehow.

So, I ran the target through ctx.resolve_tools(). This returns the middle man artifact (contained within a depset of all necessary files) and an Starlark-opaque RunfilesSupplier object. Well, we have the RunfilesSupplier now, so I fed that into a java hook to call RunfilesSuppler.getRepoMappingManifest(). This sort of works. It did give a repo mapping file.

However, the resulting repo mapping is only correct if the outer target and inner target have the same dependencies (including implicit deps). In my experiments, I used a stub executable with no dependencies as the "inner" target. It's repo map was just the current workspace. The actual repo map includes the various implicit dependencies of py_binary itself (rules_python, platforms, bazel_tools, some others).

So to make this work, the inner target needs to copy all the passed in user values and the implicit deps of py_binary, forward along the transitive runfiles (if it doesn't, the repo map won't be correct) (and this particular runfiles object is unused), produce an unused executable file (to satisfy some starlark validations), and then the outer rule needs to dig out the repo mapping somehow. Note that all the attributes have to continue to exist on the outer rule so that things like aspects can still traverse them. I think there might be a more direct route to get the repo mapping than going through the middleman artifact (I think a java hook to go Target->FilesToRunProvider#getRunfilesSupport is possible), but that doesn't help with the generation of its content.

So, technically it's possible, but it is pretty wasteful. I'm very wary of generating essentially a copy of py_binary() -- the extra target isn't free; we already know this has a real cost. Within Google, we were forced to do this to make PAR files work and saw the effects in the performance metrics, and only deemed it OK because we didn't really have a choice and plan to make it an opt-in behavior instead of always-on behavior.

Ah -- I see the RepoMappingManifestAction PR got merged! I'll experiment with that -- looks pretty straight forward.

fmeum · 2023-05-16T09:55:22Z

@bazel-io flag

Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark. Work towards bazelbuild#17941 Closes bazelbuild#18349. PiperOrigin-RevId: 531165675 Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905

iancha1992 · 2023-05-16T21:31:37Z

@bazel-io fork 6.3.0

Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark. Work towards bazelbuild#17941 Closes bazelbuild#18349. PiperOrigin-RevId: 531165675 Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905

…18419) Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark. Work towards #17941 Closes #18349. PiperOrigin-RevId: 531165675 Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905

iancha1992 · 2023-05-19T19:16:03Z

@fmeum @rickeylev should this be now cherry-picked and included in the release-6.3.0?

rickeylev · 2023-05-20T04:23:16Z

That would be great, yes.

Note that this fix (4e60992) requires ee46d1a

iancha1992 · 2023-05-22T22:00:21Z

That would be great, yes.

Note that this fix (4e60992) requires ee46d1a

@rickeylev @fmeum I did make sure the commit ee46d1a was merged. However, it seems like additional commits needs to be cherry-picked first in order for the commit to be cherrypicked in the release branch. Could you please submit a PR or let us know what commits to include? Currently, there are 3 files that are missing in the release-6.3.0 branch (src/main/java/com/google/devtools/build/lib/rules/python/PyBuiltins.java, src/main/starlark/builtins_bzl/common/python/py_executable_bazel.bzl, src/main/starlark/builtins_bzl/common/python/py_internal.bzl).

rickeylev · 2023-05-22T22:08:16Z

Ahhh, I forgot most of the Starlark rewrite work didn't make the 6.0 cut. It won't be feasible to port this to 6.x. Even if we did, it'd require re-implementing the fix based on the (now deleted) Java implementation of the rules.

Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark. Work towards bazelbuild#17941 Closes bazelbuild#18349. PiperOrigin-RevId: 531165675 Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905

…ctly To make runfiles lookups work correctly with bzlmod, a `_repo_mapping` file is used to map apparent names to canonical names. This file is normally automatically created by higher level code and injected into the runfiles, but that happens after a rule has finished running. This means it isn't in the runfiles object that py_binary itself sees when its constructing a zip file of itself. To fix, the zip file generates a repo mapping itself and puts it into the zip file, thus allowing the runfiles library to find it when run from the zip file. This requires a Java hook since the underlying `RepoMappingManifestAction` isn't directly exposed to Starlark. Fixes bazelbuild#17941 PiperOrigin-RevId: 532265478 Change-Id: I5f36fe58f043611481a7ed6ba19a7bbf5be4edf2

iancha1992 · 2023-06-23T18:59:22Z

Ahhh, I forgot most of the Starlark rewrite work didn't make the 6.0 cut. It won't be feasible to port this to 6.x. Even if we did, it'd require re-implementing the fix based on the (now deleted) Java implementation of the rules.

cc: @Wyverald @meteorcloudy

coquelicot added type: bug untriaged labels Mar 31, 2023

coquelicot assigned ShreeM01 and sgowroji Mar 31, 2023

sgowroji unassigned sgowroji and ShreeM01 Mar 31, 2023

sgowroji added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. area-Bzlmod Bzlmod-specific PRs, issues, and feature requests labels Mar 31, 2023

fmeum mentioned this issue Mar 31, 2023

Include symlinks and root_symlinks in Python ZIPs #17942

Open

meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Apr 4, 2023

fmeum mentioned this issue May 8, 2023

Don't eagerly flatten a NestedSet in RepoMappingManifestAction #18349

Closed

rickeylev self-assigned this May 12, 2023

copybara-service bot closed this as completed in 4e60992 May 15, 2023

bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label May 16, 2023

fmeum mentioned this issue May 16, 2023

[6.3.0] Don't eagerly flatten a NestedSet in RepoMappingManifestAction #18419

Merged

bazel-io mentioned this issue May 16, 2023

[6.3.0] bzlmod + build_python_zip breaks runfiles resolution #18427

Closed

bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label May 16, 2023

uhlajs mentioned this issue Oct 11, 2023

[Bug]: Runfiles resolution is broken for oci_python_image example aspect-build/bazel-examples#291

Open

purkhusid mentioned this issue Oct 24, 2023

Packaging binaries with runfiles tree bazelbuild/rules_pkg#579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bzlmod + build_python_zip breaks runfiles resolution #17941

bzlmod + build_python_zip breaks runfiles resolution #17941

coquelicot commented Mar 31, 2023

fmeum commented Mar 31, 2023

fmeum commented Mar 31, 2023 •

edited

Loading

fmeum commented Mar 31, 2023 •

edited

Loading

lberki commented Apr 4, 2023

fmeum commented Apr 4, 2023

lberki commented Apr 4, 2023

fmeum commented Apr 4, 2023

rickeylev commented Apr 5, 2023

fmeum commented Apr 15, 2023

rickeylev commented Apr 15, 2023

fmeum commented Apr 15, 2023

rickeylev commented May 11, 2023

rickeylev commented May 11, 2023 •

edited

Loading

fmeum commented May 16, 2023

iancha1992 commented May 16, 2023

iancha1992 commented May 19, 2023

rickeylev commented May 20, 2023 •

edited

Loading

iancha1992 commented May 22, 2023

rickeylev commented May 22, 2023

iancha1992 commented Jun 23, 2023

bzlmod + build_python_zip breaks runfiles resolution #17941

bzlmod + build_python_zip breaks runfiles resolution #17941

Comments

coquelicot commented Mar 31, 2023

Description of the bug:

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

fmeum commented Mar 31, 2023

fmeum commented Mar 31, 2023 • edited Loading

fmeum commented Mar 31, 2023 • edited Loading

lberki commented Apr 4, 2023

fmeum commented Apr 4, 2023

lberki commented Apr 4, 2023

fmeum commented Apr 4, 2023

rickeylev commented Apr 5, 2023

fmeum commented Apr 15, 2023

rickeylev commented Apr 15, 2023

fmeum commented Apr 15, 2023

rickeylev commented May 11, 2023

rickeylev commented May 11, 2023 • edited Loading

fmeum commented May 16, 2023

iancha1992 commented May 16, 2023

iancha1992 commented May 19, 2023

rickeylev commented May 20, 2023 • edited Loading

iancha1992 commented May 22, 2023

rickeylev commented May 22, 2023

iancha1992 commented Jun 23, 2023

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

fmeum commented Mar 31, 2023 •

edited

Loading

fmeum commented Mar 31, 2023 •

edited

Loading

rickeylev commented May 11, 2023 •

edited

Loading

rickeylev commented May 20, 2023 •

edited

Loading