-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "Locating runfiles with Bzlmod" #269
Conversation
@Wyverald @meteorcloudy @phst I have added you as reviewers based on the discussion we had about this topic. I would like to have a maintainer of either rules_java or rules_cc on the list as well, but wasn't sure who would be available. I'm a bit uncertain about the wording in https://bazel.build/contribute/design-documents#new-proposal. Should I already send out the email or wait for this PR to be merged? "Submitted" could mean either of the two. |
@comius Looks like you are back, would you be willing to review this proposal from the rules owner perspective? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the great writeup :) I'm still reviewing; leaving first round of comments
If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. | ||
|
||
In order to help runfiles libraries find the repository mapping manifest, it is added to the runfiles manifest (and thus the runfiles directory) under the fixed path `_repo_mapping`. | ||
Since canonical repository names do not start with `_`, there is no risk of this synthetic part being a prefix of an actual runfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is potentially untrue -- canonical repo names can start with _
. We currently don't have such repos (not that I know of, anyway), but we haven't ruled out the possibility. (For example, there's a pseudo-repo called @_builtins
.)
Is the runfiles manifest also inside the runfiles directory, or is it only outside of it? Is it crucial for the _repo_mapping
file to be inside the runfiles directory? Maybe I'm trying too hard to be 100% sure that a conflict can't happen...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important bit for me was that we can't have collisions with canonical repository names of user-provided repos.
Collisions with internally defined repositories are less of a concern since we know them all. Also, nothing is stopping us from giving essentially arbitrary names to Bazel-defined repositories. But if you prefer a different escaping scheme, I can use @@repo_mapping
or even _never_let_this_internal_repo_name_through_code_review_repo_mapping
instead. ;-)
Since we seem to be reviewing the proposal in code review rather than a doc, you should send mail to bazel-dev@ now, so others can see it without stalking this repo. I always avoid the problem by doing the proposal in a doc and making the PR just be a pointer to the doc. I find github's UI really insufficient for having discussions about text. |
|
||
In order to support Bzlmod, rulesets providing runfiles libraries will have to be updated to advertise the new provider. | ||
Since a new global symbol can't be feature-detected in Starlark, this can only be done once the ruleset providing the runfiles libraries targets a Bazel version that offers the new provider as its minimum supported version. | ||
For runfiles libraries that do not ship with Bazel, it would thus be helpful to cherry-pick this change into Bazel 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it would be helpful if all runfiles libraries were external to Bazel. That facilitates writing code that gets runfiles under bazel test and bazel run, but can also be repackaged and installed in the canonical manner for any OS. Those rarely would have a .runfiles tre e the way we build it.
Even if we make them all external, we may not have to backport this feature to bazel 5.x. Since each Bazel LTS could have the potential to do very different things with runfiles, we will have to account for that at the library side. Either we have versions of library for each LTS track, or, have the runfiles manifest/layout itself contain a version number. Some day in the future, Bazel 9.x will use runfiles-v2, and a single runfiles library at that time could read all back versions. That style might solve the problem of backport to 5.x We could use the non-existance of the runfiles version to know it is the old style, and read accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the appeal of having runfiles be external to Bazel, but in this particular section, a backport of a small part of the functionality proposed here seems actually needed even if all libraries became external already tomorrow: We have to provide a way for runfiles libraries to mark themselves as such and this way has to be understood by Bazel itself.
That's because we are essentially introducing version 1 of a new manifest (the repository mapping manifest) here rather than a new version of an existing manifest. The new manifest type needs some additional support within Bazel itself.
At my company, we are actually using the runfiles libraries as they are provided right now at install time by relying on the manifest- rather than the directory-based implementation.
Thanks for the clarification, will send out the mail right away. I also find comments on Google Docs much more convenient than on GitHub PRs, but I prefer Markdown (especially its code blocks) to Docs formatting. It's also nice that Markdown files checked into the repo can't get lost - regaining edit rights to the original runfiles libraries design doc took quite some effort. |
|
||
b. If the runfiles library is not shipped with Bazel, it should fall back to the original lookup procedure. This is necessary to support the use of runfiles libraries with versions of Bazel that do not yet create the manifest. | ||
|
||
2. The runfiles library parses the repository mapping manifest and stores a map of `A` to `repo_mapping(C, A)` for the fixed canonical repository name `C` corresponding to the repository from which the `Create` method was called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to determine C in Rlocation, not Create? Otherwise, a runfiles object passed across repository boundaries or a single global lazily-initialized runfiles object would give wrong results when used from different repositories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably correct that this is what we should do for consistency (also with languages like Bash
that don't even have a Create
).
The one drawback of that approach is that it makes it more difficult to migrate to the new API as the potentially many Rlocation
calls all have to be updated. But if that turns out to be a concern, we could always add a CreateForSingleRepository
function that locks in the repo mapping.
I will change the section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the section, PTAL.
The new Starlark-defined RunfilesLibraryInfo provider may be used in the future to decide whether additional information has to be emitted by a rule for a target to find its runfiles at runtime. This may include serializing repository mapping information to a runfile and/or generating code that provides the canonical name of the current repository. The provider is added now rather than later so that third-party runfiles libraries can rely on it being present earlier - they have to wait for their minimum supported Bazel version to offer it. If it should turn out not to be needed, it can safely be removed before any ruleset would have a reason to depend on it. Work towards bazelbuild/proposals#269
The new Starlark-defined RunfilesLibraryInfo provider may be used in the future to decide whether additional information has to be emitted by a rule for a target to find its runfiles at runtime. This may include serializing repository mapping information to a runfile and/or generating code that provides the canonical name of the current repository. The provider is added now rather than later so that third-party runfiles libraries can rely on it being present earlier - they have to wait for their minimum supported Bazel version to offer it. If it should turn out not to be needed, it can safely be removed before any ruleset would have a reason to depend on it. Work towards bazelbuild/proposals#269
The new Starlark-defined RunfilesLibraryInfo provider may be used in the future to decide whether additional information has to be emitted by a rule for a target to find its runfiles at runtime. This may include serializing repository mapping information to a runfile and/or generating code that provides the canonical name of the current repository. The provider is added now rather than later so that third-party runfiles libraries can rely on it being present earlier - they have to wait for their minimum supported Bazel version to offer it. If it should turn out not to be needed, it can safely be removed before any ruleset would have a reason to depend on it. Work towards bazelbuild/proposals#269
With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows: | ||
|
||
``` | ||
,my_module,my_workspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify that the following:
,my_module,my_workspace
,my_workspace,my_workspace
entries are needed because of data.txt in the data attribute. It took me some time to figure that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more explanations.
- `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`; | ||
- `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`. | ||
This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles. | ||
If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this seems very well thought out but I myself am having trouble imagining how all of this will fit together and foreseeing problems like the one mentioned here without actually looking at the final code. Would it be possible to have a prototype of the whole thing working for at least 2 main rules like Java and C++? I think that would make it easier to spot any issues in the design that cause big problems like unnecessary rerunning actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a prototype for this for Java. I will look into developing it a bit more and including C++.
|
||
End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern. | ||
|
||
# Suggested follow-up changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we leaving anything behind that gets deprecated and would require house-keeping? Assuming a world where everyone is using Bzlmod, is anything left in the code base that is no longer necessary that would need to be cleaned up? If that's the case, can you have a section describing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of anything. Even though Bzlmod is the motivation for this proposal, there is no direct dependency between the two as repository mappings can already be used in WORKSPACE
and would break Stardoc today. It just so happens that they are very rarely used (although I believe rules_kotlin relies on them).
The new Starlark-defined RunfilesLibraryInfo provider may be used in the future to decide whether additional information has to be emitted by a rule for a target to find its runfiles at runtime. This may include serializing repository mapping information to a runfile and/or generating code that provides the canonical name of the current repository. The provider is added now rather than later so that third-party runfiles libraries can rely on it being present earlier - they have to wait for their minimum supported Bazel version to offer it. If it should turn out not to be needed, it can safely be removed before any ruleset would have a reason to depend on it. Work towards bazelbuild/proposals#269
With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows: | ||
|
||
``` | ||
,my_module,my_workspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more explanations.
|
||
End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern. | ||
|
||
# Suggested follow-up changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of anything. Even though Bzlmod is the motivation for this proposal, there is no direct dependency between the two as repository mappings can already be used in WORKSPACE
and would break Stardoc today. It just so happens that they are very rarely used (although I believe rules_kotlin relies on them).
- `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`; | ||
- `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`. | ||
This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles. | ||
If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a prototype for this for Java. I will look into developing it a bit more and including C++.
@oquenchil and everybody else who is interested in a demo: I created an example repo that you can try out the main parts of this proposal on using the linked rough prototype of a Bazel implementation of parts 1, 2, and 4 for Java and C++. The example binaries emit their repository mapping manifests as well as the value of the injected current repository name. |
LGTM |
@oquenchil Thanks for the review, I changed the relevant section to state that we are planning to reuse Do you happen to see a way we could introduce |
2. [Stardoc](https://github.com/bazelbuild/stardoc) generates documentation from `.bzl` files and for that has to resolve labels in `load` statements to `.bzl` files on disk. | ||
Since it doesn't actually evaluate `WORKSPACE` or `MODULE.bazel` files, it has no way of discovering repository mappings in the same way as Bazel. | ||
Thus, with Bzlmod, Stardoc fails to generate documentation for any `.bzl` file that loads `.bzl` files from other repositories. | ||
This issue is tracked by [bazelbuild/bazel#14140](https://github.com/bazelbuild/bazel/issues/14140) and [bazelbuild/stardoc#117](https://github.com/bazelbuild/stardoc/issues/117). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: my current plan (currently in early prototype) is for stardoc to avoid doing any kind of custom disk paths munging, and instead, read documentation from the compiled starlark module object stored in skyframe through exactly the same mechanism as what is currently used for implementing the load
statement.
In other words: if you can load
the .bzl file, you'll be able to generate docs from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great! Could you keep me updated on this on bazelbuild/bazel#16124?
No description provided.