-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update "Locating Runfiles with Bzlmod" #274
Conversation
a86cd03
to
a25323b
Compare
@cushon This proposal relies on the ability to inject a generated class containing a single |
Can you expand on the rationale for making it an implicit neverlink dep of every Java compilation action, vs. an explicit dep of the targets that need it? |
@cushon With Bzlmod, any kind of runfiles lookup (e.g. using Ideally, we would have one neverlink I'm happy to provide more context if needed. |
There are two things to figure out:
I was thinking about (2) for a bit but I couldn't think of a good alternative that doesn't require iterating over all Artifacts in a Runfiles when constructing the latter. This might still be acceptable because I could imagine handling Artifacts that go into a Runfiles by way of In addition, even if one iterated over nested sets passed in through Given this, we either do a complicated thing like this about (2) and try to cull the repo mappings written to the runfiles manifest or give up and do the full transitive closure and then (1) doesn't need to be answered. I'm leaning towards the latter because it's simpler, but I haven't spent a lot of time thinking about bzlmod so it's mostly due to the "prefer simplicity" heuristic than deep thought about the ramifications of this decision. |
I think that we can (at least for the beginning) just emit the repo mappings for the full transitive closure. There is already some amount of filtering in the proposal since we only emit repo mapping entries mapping to canonical repo names of repos that actually provide runfiles. I don't see how we could fit "non-strict runfiles dependencies" in this scheme though. AFAIU, the problem isn't that we don't collect enough information, it's that we don't have an identifier to refer to these files except by their canonical labels - there is no apparent name for them users could provide to look up the canonical name. This situation is similar to what was discussed about canonical label literals in general, for example here. The recursive label idea seems too complicated for me to introduce for this niche use case - not having users rely on the contents of targets they don't have repo visibility in sounds more like a plus to me. Note that targets could always propagate runfiles library paths in code if any consumer of a library needs to look up a file they may not have repo visibility in and thus a name for. @lberki I'm not sure I fully understood your points above, please let me know if you had something else in mind. |
Would it be theoretically possible to have an explicit rule per repository that exposed That requires more typing than implicitly creating the rule and adding it as a dep, and it's potentially error-prone if a target depends on one of those rules from a different repository. |
That would be possible. A macro could help by automatically limiting visibility to
@lberki What do you think, would the above make for an acceptable user experience? |
@cushon how is this extra magic rule better than having a Starlark symbol contain it? (I'm not sure which approach is better, I'm trying to map out the design space here) |
@fmeum you got my worries about runfiles semantics right; I also think that the recursive label syntax as proposed in the thread you linked is a lot of complexity for not a lot of benefit, or at least not a lot of benefit that's obvious now and therefore I'd rather not do that right now. I thought that the solution to non-strict dependencies is to have custom logic that propagates the apparent repository names in the code itself. For example, if binary
IOW, the binary at runtime does need transitive repository mappings to be able to find its runfiles. The questions are:
These are coupled because it looks like there isn't an automatic way for Bazel to figure out which runfiles are requested, so it must either assume "everything" (in (1)) or rely on it being told that (in (2)) I originally dismissed (2) since y'all seem to think that it's too onerous,. |
@lberki I don't see how would a Starlark symbol would help here - the difficult part is getting the content of that symbol into Java code. If that requires knowing about the "constants are inlined at compile-time" trick, I think that we shouldn't leave users to do this themselves rather than providing a rule that does it for them. |
Huh, I'm confused. I thought we agreed upon:
And now we are discussing between these two options:
and that we think (1) is the best alternative and we are exploring (2) in order to explore the design space. What did I get wrong here? |
@lberki I'm confused as well, so let's go over the points and gain clarity.
Assuming a Starlark rule, isn't that just
It should be possible one way or the other. @cushon inquired about alternatives to the JavaBuilder hack and we came up with one more (the target-per-repo approach), with different tradeoffs: Worse UX, lower complexity on the Bazel side of things and in particular no changes to JavaBuilder.
Yep, that is the current state of the proposal and the only one I have considered so far for general use.
Could you elaborate on what you mean by "top-level repository mappings"? Is an accurate example of this point the |
Good point about It appears that we are on the same page about the feasibility of plumbing the name of the current canonical repository to Java code. If you need someone to make a decision, I'd be happy to, but for now, I assume that between you, @cushon and @Wyverald , there is more shared knowledge than in my head so y'all are in a better position to do so. re: my proposal about "top-level repository mappings", forget about it, I think I had a brain glitch; I thought that with more code in Starlark, one can avoid the complexities of a |
I will some day use the fact that @lberki himself got confused this as a very compelling argument to deprecate and replace it with a more aptly named one. :-) |
re: 'constants are inlined at compile-time', one thing I remembered about this is that modern versions of javac emit constant pool entries for inlined constants. In the following example So dependency tooling could notice that the definition of the class that contains the repository name is missing. class T {
public static void main(String[] args) {
System.err.println(Integer.MAX_VALUE);
}
}
Anyway, from my perspective, doing this in JavaBuilder feels high-magic and I'd like to avoid solving it at that level. But I'm open to hearing out arguments that the target-per-repo alternative isn't good enough (for ergonomics or other reasons). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "manually declare a target per repo" thing feels really clunky to use. But if I'm reading the thread correctly, we don't currently have a better alternative? (Or in case I missed something, could someone describe/link to the alternative?)
The PR looks fine to merge to me.
I will some day use the fact that @lberki himself got confused this as a very compelling argument to deprecate and replace it with a more aptly named one. :-)
Don't get me started...
None that's both more usable and not more complicated to implement and maintain. Overall, we could, in order from most usable to least usable:
|
The prototype for Option 3 including a test that shows how to use it is available at bazelbuild/bazel#16281. |
Just restating my opinion that Option 3 is very, very unattractive. If anyone wants to use runfiles, they have to declare a special target and remember to include it in deps -- this is a terrible user experience. |
@cushon is there maybe any other clever way to inject some data from the compiler command line into the code to be compiled? This is the perfect use case for C++-style preprocessor defines; I don't harbor a lot of love or them for all the usual reasons, but this is the perfect use case... The reason why I'm strongly against (3) is usability: one would both need to remember to declare the target and to depend on it. Would it be possible to work around the dangling reference problem brought up by @cushon by artificially injecting that class in every |
Annotation processing is about as close as Java gets to c-style preprocessing. You could have a javac flag We currently stamp jar manifests with target labels, we could potentially include the repo name in the manifest and then it would be accessible at runtime. That doesn't play nicely with
Maybe this speaks to a need for better dependency management tooling |
Depending on how you stringify the label, it may actually already contain the canonical repository name. But as you say, this wouldn't work for deploy jars or really any kind of custom post-processing.
Not necessarily, we internally rely on runfiles quite heavily even in code shipped to users. I'm investigating a new idea:
I will report back on whether I got this to work. |
I got this to work for Starlark @lberki @cushon Do you see potential performance issues? It could make sense to run this by gregestren and comius (not mentioning them here yet as that may lack context). Edit: This would result in DumpPlatformClassPath rerunning for every repository using Java, but we should be able to avoid that overhead by invoking We could add a stub of the generated class with Javadocs to the |
I implemented the approach described above both in Starlark (for java_library) and Java (for java_binary and java_test) in bazelbuild/bazel#16281. It does work and keeps the number of additional actions linear in the number of repos using Java (rather than Java targets), but I haven't performed any benchmarks on how the additional transition affects analysis time and memory usage. |
I was hoping @comius would chime in (or @hvadehra ) but in their absence, my opinion will have to do. This approach relies on final constant inlining, just like the JavaBuilder one. It feels way more convoluted than what we thought we'd settle on, but since no one has come up with a better one, I guess it'll have to be this one? Mirroring @cushon 's opinion that doing this in JavaBuilder is high magic, I have a similar opinion about doing this within Bazel, but as long as we agree that the final API will be a constant that'll then be inlined, we are just talking about implementation details that can be later changed at will so I don't particularly mind either way. I have a number of issues with the code in bazelbuild/bazel#16281 ,but they are not absolute blockers:
|
@lberki Re 2.: This is possible, it just requires some mild trickery and splitting up .bzl files (see @comius commit bazelbuild/bazel@2cde45e). I can take care of that, but would prefer to get the general idea approved first. There are a number of tests that assert something about the compile-time classpath of Java rules. Should I add the generated jar to them? Re doing this in Bazel: This is only necessary as long as the Java rules haven't been fully starlarkified. Once that's done, all of this can be done in rules_java using public API. |
@fmeum for lack of a better alternative, as long as @comius and @Wyverald are fine with this plan, I am fine with it, too. What's important for me is that the programmatic API remains as simple as "import this magic class, use this magic |
Updates the proposal with:
@