-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runfiles workspace directories use wrong workspace name if repository renaming is in effect #15029
Comments
I fear this would run counter to the bzlmod principle that repositories introduced by transitive dependencies should not be able to break you. It may work well enough with the current WORKSPACE system and infrequent uses of repository mappings, but as soon as all repos are mapped, this will probably result in frequent breakages. The way I have solved this problem so far has been to generate the string passed to rlocation using a Starlark rule, which can compute the correct path by combining ctx.workspace_name and short_path. This has the advantage that it works with Bazel as is, but the downside that it requires language-specific code generation and forces users to move their data dependencies to a separate target. I have implemented this approach for Java and C++ (see https://github.com/fmeum/rules_runfiles), but it wouldn't be difficult to extend to other languages. This does bring other benefits with it such as compile-time checking and IDE completions for rlocation paths. @phst If you find the time to try out rules_runfiles, I would be very interested in your feedback on whether this approach is viable. Alternatives that do not rely on code generation seem tricky: We could try to provide the complete repository mapping to the runfiles library similar to how this is done for the RUNFILES_MANIFEST. That's not as simple as it sounds though: At runtime, it's not possible to tell from which workspace a given rlocation call originates. The most common case will be that of the binary target itself, but this will not hold in general (a binary could depend on a library from a different repository that brings in runfiles). @Wyverald I fully agree with @phst that this is one of the major problems to solve before the official release of bzmlod. Maybe we could hold a brainstorming meeting with interested parties at some point? |
Yes, this would be great (I'd also like a chance to understand the problem better). cc @meteorcloudy |
This is indeed a very important problem to solve, @Wyverald can you schedule a meeting and add interested people to join the brainstorming? |
That might be, though my gut feeling says that it might not be so bad in practice. Clashes only occur if the transitive workspace dependency set contains two workspaces with identical local names and both contain a library with an identically-named runfile and both libraries are linked into the same binary. I think at least the first two of these conditions are already pretty rare: people tend to use unambiguous local workspace names such as
That is probably a better way forward, yes. However, I'm quite concerned about existing codebases. The runfiles libraries are old, officially supported, and widely used, and their behavior is at least semi-officially documented (https://docs.google.com/document/d/e/2PACX-1vSDIrFnFvEYhKsCMdGdD40wZRBX3m3aZ5HhVj4CtHPmiXKDCxioTUbYsDydjKtFDAzER5eg7OjJWs3V/pub). Besides, the "desired" behavior is even officially documented (https://docs.bazel.build/versions/5.0.0/skylark/lib/globals.html#workspace):
Note that this refers explicitly to the local workspace name (as specified in WORKSPACE), not any remapped name. I think before we can make such a significant change to the behavior of runfiles, we'd need to do at least the following:
I don't doubt that it's viable; this bug is currently mostly theoretical, I don't have a concrete example (but I guess that's only the case because repository remappings are still rare). |
That's a great point, which to me suggests a two-layered approach that could look as follows (this also summarized results of our discussion to some extent):
C++: If every Java: Assume every package com.google.devtools.build.runfiles;
public final class Bazel {
public static final String CURRENT_REPOSITORY = "<canonical repo name>";
private Bazel();
} Since the JLS mandates that primitive constants be inlined into class files, users could just call Python/Shell: I don't know these languages well enough to say what a good API could look like. Potentially, we may not need any user-facing changes though as it may be possible to extract the canonical name of the current repo from the path of the current source file at runtime.
|
In addition to #15029 (comment), I think we should have some way to access the current repository name that assumes as little as possible from the programming language. For example, have a rule that generates a source file that contains the repository name as a constant (or function returning a constant). Such a thing should be possible in even the most basic programming languages. We should also document the expected behavior of runfiles and runfile APIs so that rule sets for new languages can behave accordingly. We should then also deprecate the old variants of
Do you mean
This is mostly personal style, but I'd say that would be too much magic. I think it's OK for users to have to explicitly select a repository name.
This assumes that the current source file is always |
When a target in repo
I personally also see this is as being too magical. It would probably only work in C++ anyway and even there I don't know whether it couldn't break in unexpected ways.
I know next to nothing about the Python rules. @meteorcloudy Can you say whether this is a safe assumption? In general, what do you think would be the most idiomatic way to get the current repo name in Python? |
I guess that's true in most cases, but not sure how it should work if you use source file from another repository. Should the current repo be the repo of the target's package or the repo of the source file? Also it may not be easy to parse out the repo name from the file path, how do you determine which segment is runfiles_dir, repo name and package path. I'm thinking about a general solution. Please tell me what you think. In @bazel_tools//runfiles/runfiles_lib.bzl, we provide a macro for language xx to define a runfiles library target:
When I need to access runfiles, I call the macro to define a
Then for the target that needs to access runfiles, it depends on this runfiles library that's aware of its own repo, eg. foo/bar/BUILD
Currently, a genrule can export RULEDIR, I think it's easy to also export The downside is that you need to define (and build) the runfiles library in each repo, but you only need to define it once for each repo (that needs runfiles access). |
@meteorcloudy I am worried about what exactly you would put in
Of course, the |
@fmeum I think you're right, it's really hard to make my proposal work. As for getting repo name from source file path in python, I did some test with https://github.com/meteorcloudy/my_tests/tree/master/python_deps_test
As you can see the weird one is |
@tetromino had some thoughts on this issue (cc @brandjon):
|
This sounds like a reasonable approach. Just a few points:
|
I really hope not o,o I think file-based technically makes a bit more sense, but again I really hope nobody is using a source file from another repo while the target includes runfiles and the source file refers to such runfiles... |
May I formally request that changes to runfiles library to go through the proposal process in https://github.com/bazelbuild/proposals? It might help tremendously for future references looking back as well as lay a good foundation for future documentation effort. (there is no doc for runfiles lib today) |
I have thought more about robust ways to implement repo mapping aware runfile lookups in Java and arrived at a code generation approach borrowing ideas from rules_runfiles. Approaches based on With the prototype at fmeum@fdbc854, Java libraries that directly depend on the runfiles library get access to a compile-time constant @comius It would be very helpful to know whether you would find modifications to the Java builtin rules along these lines acceptable if they should turn out to be necessary to realize a user-friendly API for runfiles lookups. Based on your feedback on this prototype, I would think about more independent alternatives if necessary. |
I took a quick look at the
You probably shouldn't add a provider, new source and action building it to a BuildInfo is doing a very similar thing on See src/main/java/com/google/devtools/build/lib/rules/java/JavaBuildInfoFactory.java. ATM native impl. is used. @buildbreaker2021 is working on implementing it via default API ( I think you should be able to solve the problem this way (but I didn't read all the comments on the ticket in detail). The BuildInfo generated |
@comius The provider itself isn't strictly necessary - it is used only to keep the repository mapping manifest small (less rebuilds because irrelevant repositories change their repo mappings) and to only emit the new actions in I looked at
In order to know the correct repository mapping context to apply, we essentially have to tie every Java source file that accesses runfiles to the repository containing the As far as I can tell, there are only two approaches to distinguish source files:
|
Hey all. I've finally got some tactile knowledge about repo remappings and how to use them in runfiles library. So at the moment we add an additional repo_mappings file as described in fmeum's comment: #15029 (comment), option Basically we have two maps:
The problem I'm seeing choosing Migrating runfiles doesn't seem such a big problem, except that the format is public and there are locations where people parse the files themselves. I consider the "data" functionality quite an important one, and we probably shouldn't be migrating all such java libraries. If I understand correctly Could we do something that provides a more "gradual breakdown", allowing for a more gradual migration? I was thinking about something that's close to Option
Old runfiles lib will work on a non-conflicting case and break on the conflicts. Monorepos will work as before, without any regression. The change should be easy to implement here: bazel/src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java Lines 453 to 457 in 5649627
We could slightly improve error handling by adding also something like: My take on Java:cc @cushon What's more tricky is, that runfiles library, should obtain its callee's META-INFO. I think this might be possible in Java, slightly magic though. But it provides the right user experience - no java library using data needs to be modified. |
In fact, many runfiles libraries out there are already broken in one way or another. In the best case, they just don't work in some situations (on Windows, on Linux without a runfiles directory, ...), in the worst case they non-hermetically pick up runfiles from the output base. I do see value in having these converge towards a single, reliable implementation per language.
Monorepos using the official runfiles libraries will not have to make any changes as the main repo will be the default "current repo" context without having to pass anything manually.
The problem I see is that it wouldn't be "the same", but more of it: If 10 Bazel modules depend on the protobuf module by 10 different names, the runfiles manifest and runfiles directory would need to include 10x as many entries for protobuf runfiles. This can be implemented, even as a fallback for the current design, but it won't come for free, especially with sandboxed execution and will affect everyone's build performance, regardless of whether they actively use runfiles lookups or not. Maybe we could think about offering it via a flag? There is also the problem that most runfiles libraries today don't check whether the target file exists when using the runfiles directory as a source. Checking whether a given path conflicts or not would require introducing an existence check, which may also break existing users.
With the standard multi-jar model, this is possible: You can use |
I agree that's a "theoretical" worst case. Not also that you only get a conflict when both repo name and path matches and point to a different file. Do you have data what would happen in large bazel repos? Tensorflow would be an example, but it's not using bzlmod. I suspect maven dependencies might create conflicts or perhaps not. Note also you don't need existence check in the case you use runfiles_manifest. The new library should first check for the key with
Argh, you're right. Back to the drawing board. Annotations sounds like the best next option for now. We can't add a file next to each .java file with import Runfiles in it. |
Even the not so bad average case of having every module depended on via its module name and the legacy repo name ("protobuf" vs "com_google_protobuf") would already double the number of external symlinks with sandboxed execution. Given that creating these symlinks is already a performance concern right now, a factor of 2 may actually be pretty bad.
That is actually a major problem: I don't know of anybody who already uses Bzlmod at any kind of non-trivial scale, yet we have to design its approach to runfiles discovery or else nobody will be able to use them. Not a great position to be in and very easy to make mistakes that will be hard to change when it has matured. That is why, when in doubt, I prefer having users migrate on to some clearly defined API that we can change the implementation of in future versions of Bazel without breaking workflows again. If we continue to support runfiles lookups that don't go through the official runfiles libraries and even make the layout of the runfiles directory more complex, I'm worried that we might get ourselves into an even worse situation. We should also keep in mind that migrating to Bzlmod isn't straightforward and definitely requires quite a bit of migration work, so having to migrate runfiles may not be the most time-consuming part.
Yes, that should work well. But in situations where the manifest isn't available, e.g. with remote execution, we would still need a strategy that works with just the runfiles directory. It's possible to do, but may again end up breaking someone (although most likely only a minority of users). |
The number of symlinks created by protobuf would double, right. Can we count how many there are on tensorflow pre--bzlmod perhaps? My guess is it's a small number. You can improve for almost a factor of 2 by flipping
I agree. I think a more backwards compatible API I'm proposing is also clearly defined. And we get a longer and more subtle migration period. This means less unsatisfied/broken users.
We can't really hide manifests or the implementation from the users. So there will always be un-official versions, even after currently envisioned migration.
In my case with rules_kotlin, this seems to be the most time-consuming part + version skew problems. I need new Bazel release and I need to wait rules_kotlin release and then I need to wait protobuf release (which depends on rules_kotlin nowadays). Having backwards compatible runfiles, it would just work with current releases. |
That's a good idea, we'll see whether we can get some numbers and extrapolate from there.
It was just my perception given the investment in sandboxfs and the rumor I picked up that Google doesn't use sandboxed execution internally due to performance concerns. I certainly know less about this than you though :-)
Sorry, I didn't mean to say that your proposal doesn't offer a clearly defined API! I'm just a bit worried about the API surface it introduces. "Put this kind of string and your repo into Rlocation and get an absolute path", while blunt, is something that allows for a lot of freedom for future development, including implementing the scheme you propose as a backend to ease migration. Starting with the scheme and having people actively depend on it during their Bzlmod migration would seem to be locking in more of the details of the runfiles tree.
Yes, I think the only way to have more convergence is to actually make using the runfiles libraries more beneficial for end users. @laszlocsomor had some ideas about this and @phst already implemented some of them for Go: For languages that support them (Go, Java), we could offer a file system view on the runfiles that offers the same functionality regardless of whether its backed by a manifest or a directory. This would allow users to perform operations such as listing directory contents and traversing the runfiles tree. I will look into this once the Bazel 6 release craze is over.
Realistically speaking, with Bzlmod, you will have to wait for a good chunk of your transitive dependencies migrating anyway, simply because it requires much higher label "hygiene" than has been necessary so far. I agree that writing a new runfiles library for a ruleset is quite a bit of work, but it needs to be done only once and only by ruleset authors, not by end users. I am also working on a Starlark reference implementation to help third-party ruleset authors implement them. |
@fmeum the rumor is half true: Google doesn't use sandboxed execution, but not due to performance concerns, but because most of our builds run remotely and thus a sandbox is not necessary. Or maybe you have better information than I do :) @Wyverald has thought a lot about how runfiles could work sensibly with bzlmod, but he hasn't found any solution that is:
In fact, fulfilling just two of these tree is difficult, so I think his decision to go with the simplest solution that can be called "good enough" is sound. Losing backwards compatibility this way is not great, but given that there are cases when the runfiles symlink tree does not even exist ( (The astute observer will note that a runfiles tree is pretty much necessary for interpreted languages, e.g. Python. The game plan for that in turn is to make the runfiles tree not necessary. Somehow.) |
@lberki Now, should we drop it without fairly evaluating it? |
@comius |
I don't actually see the two proposals as being exclusive and if we are willing to do the work, we could implement and benefit from both: If we get the order of lookups in the Bazel-provided runfiles libraries right, they should remain drop-in compatible with a hypothetical (Now that I am reading through this thread again, this is actually quite similar to #15029 (comment)) Given that this functionality would be mostly relevant for end users rather than rulesets (@comius, please correct me here if I should be misrepresenting your proposal), it wouldn't be an issue if we would introduce this flag only in 6.1.0, giving us more time to think this through and implement it. @comius What do you think, could that be a reasonable path forward? If you are interested, we could schedule a short call and go over how this could be implemented and whether there are any unforeseen issues. |
@comius sorry, I let my enthusiasm for cleaning up legacy behavior shine through. I think that it has merit, modulo the performance concerns voiced by @meteorcloudy and the fact that then the there would be two ways the runfiles library would have to check potentially two entries to find a particular runfile. An argument against Since then, another issue surfaced ( #16379 ) which will probably require another change to the runfiles infrastructure. I see the following possible paths ahead:
WDYT? |
I may be missing an interaction, but doesn't the set of flags you mentioned neatly divide into two parts, one of which we can mostly ignore?
For the purpose of resolving this issue and #16379, we should be able to ignore all the flags in group 1 as long as we ensure that everything we come up works both with the directory and the manifest representation of runfiles. That would certainly reduce the cognitive overhead while thinking about solutions. |
I am on board with ignoring flags in group (1) if we can manage to simplify our collective lives, but I don't know if the are orthogonal enough with the new repo mapping file to be able to do so. @Wyverald ? (if he doesn't chime in, let's ignore them because I think it's likely that we can do so( re: |
If we settled on "nobody uses repository mappings with Regarding #16379: Given that Bazel's configuration identifiers probably aren't something we want users to hardcode (which is the opposite of the situation for apparent repository names), I don't think #16379 can be solved without asking users to go through a helper such as |
Indeed! I was assuming @phst was using bzlmod and it's just that he formulated the bug with repo mappings to be minimal, but maybe not. |
Pretty sure we can ignore those, as they are exercised by tests and the change to add the repo mapping manifest still passed those tests.
While that's true, we've essentially treated all repo mappings bugs so far as "bzlmod bugs", if nothing else then because repo mappings basically did not work for anything past "translating labels in BUILD files". Pretty much any other usage you can think of revolving repo names, it was broken. So I'm okay with things staying broken but only being fixed by turning on --enable_bzlmod, if it makes our lives easier.
I like this idea very much! My take on this whole discussion remains that we should go for an eventually good API first, and worry about backwards compatibility second. Two main reasons: 1) like @lberki said --enable_bzlmod is a backwards compatibility breaker anyway, so this is a good opportunity to imagine what we should've designed without all the mistakes in the past; and 2) the backwards-compatible approach with runfiles is somewhat doomed anyway because it doesn't work on Windows. |
Your assumption is correct :) |
I tested a simple implementation of my proposal in #15029 (comment), but it just doesn't work. The problem are functions Artifact.getRunfilesPath and Label.getWorkspaceName, which only return one path and can't do a fallback. It looks to me that backwards-compatible approaches are now fully doomed. |
@comius can you elaborate why it doesn't work?. I don't understand your terse sentence "only return one path and can't do a fallback"; AFAICT when the runfiles input manifest is written, we have both the repository mappings and the runfiles artifacts, which is all the data one needs to product a runfiles manifest, no matter what format one chooses. (I still think that the approach you linked above wouldn't work, but for a different reason as discussed above, but let's explore the problem space before deciding) |
The idea was to produce duplicated entries only on collision. The writing of runfiles manifest like this works. All the data is there. The problem is that for example And you can't obtain the data about collisions, because it's only available at a particular binary. |
Ah, got it. So it's not that writing the runfiles manifest doesn't work, it's that Bazel assumes that the runfiles path of an artifact is constant, which would not be under your proposal. I could imagine having symlinks to an artifact both at its canonical location (which would be constant) and its "legacy" one (similarly to what @fmeum recommends in #15029 (comment)), but at that point, we are back to the question how many extra symlinks is acceptable. |
This has basically been fixed in 6.0 if you use the various runfiles libraries. |
Description of the problem / feature request:
If a workspace B depends on a workspace A, but uses the name C instead of A to name the workspace, the runfiles for workspace A will appear as C in the runfiles directory, breaking runfiles lookup for any library in workspace A.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
/tmp/test/a/WORKSPACE:
/tmp/test/a/BUILD:
/tmp/test/a/lib.h:
/tmp/test/a/lib.cc:
/tmp/test/a/data.txt:
/tmp/test/b/WORKSPACE:
/tmp/test/b/BUILD:
/tmp/test/b/bin.cc:
Then,
cd /tmp/test/b && bazel run :bin
will fail. In /tmp/test/b/bazel-bin/bin.runfiles there will be a subdirectoryc
(i.e. the name of the repository inb
), but no subdirectorya
(i.e. the actual name of thea
repository).The same works after renaming the local repository in
b/WORKSPACE
toa
(as the comments indicate) and commenting out therepo_mapping
attribute.This is a problem because code in workspace A can't possible know that its clients import it under a different name. Therefore, code in workspace A can only refer to runfiles within its own workspace under the name that it has declared in its own WORKSPACE file (
a
in this example). This means that the runfiles actually have to be available under thea
name, independent of any repository mappings. So Bazel should not only create a symlinkc/data.txt
in the runfiles directory, but alsoa/data.txt
. More generally, any runfile should be available under both its local workspace name and any renamed/remapped workspace names. In case of name clashes, Bazel should signal an error, as it does for other name clashes.What operating system are you running Bazel on?
macOS
What's the output of
bazel info release
?release 5.0.0-homebrew
What's the output of
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?Irrelevant
Have you found anything relevant by searching the web?
There are probably many related discussions, but I haven't found a bug report about this specific issue itself. Probably the problem will become much worse with bzlmod due to its extensive use of repository renaming, and I'd consider it a blocker for enabling bzlmod by default.
The text was updated successfully, but these errors were encountered: