-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action conflicts due to differing affected by starlark transition
value
#14239
Comments
Huh. I hope this is something small. I was thinking 711c44e would be a pure improvement. |
This is strange, the ST-hashes should be different if affected-by-Starlark-transition is different |
I am working on this but out of curiosity, does anyone who encountered this have a smallest possible test case? The other example I have been debugging with involve cc_proto_library computation, which is also somewhat large. |
New theory: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java#L443 this null check is bad. (Short story: just because the Starlark value is at its default does not mean it should be excluded from ST hash addition; especially since the commandline may have explicitly set the Starlark value.) I have a change that adjusts the hashing logic to do something more correct and this is seemingly fixing the internal test case I have. Will try to export something for you to test against your cases as well. I do not have an ironclad explanation as to why this bug only surfaced now. My best guess is that the previous bug with transitionDirectoryNameFragment not being updated properly was actually masking this. |
Good catch! Removing the check makes all action conflicts go away. I do think that the check is bad for a different reason though: Omitting a variable |
Yes, in practice when the value is null I am going to have it print key@null (just in case a user has a String StarlarkOption with non-null value "null" resulting in key=null). |
The intention with the check for null was to:
Example:First building a library, with dependency graph A-B-C, and populating remote caches:
Then building an application using the library A-B-C, but also E-F with a local transition for a starlark option unrelated to A-B-C.
Without the check for null, I suspect there would be no remote cache hits for A-B-C when building D, due to different ST hash. Right? Does that make sense? I have only experience of transitions near the top of the build, not deep in the dependency graph. |
Thanks for the explanation. I'd expect remote caching to continue to work in that case. The logic is subtle. Look at: bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java Line 425 in a9206eb
bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java Lines 442 to 445 in a9206eb
When evaluating bazel/src/main/java/com/google/devtools/build/lib/analysis/config/CoreOptions.java Lines 265 to 273 in 21dfe4c
The flag is updated at bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java Line 399 in a9206eb
which only triggers on a transition. There are no transitions from D to A, so in the case of A this would be empty and we'd have no hash at all. In the case of F it's not empty. So when evaluating F, the loop will trigger and read bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java Lines 383 to 386 in a9206eb
Here's the part where I confused myself. :/ There are two reasons value could be null. One is that it's not in the So that's a mystery to me. But to your point, whatever's going on it should all be a pure function of the dependency path from the top-level to the target in question. So since the path from D to A doesn't have a transition, that should have no bearing on the path from D to F. They should each be computed completely independently of each other. Does that all make sense? Anyone see holes in my thinking? |
@sdtwigg helpfully pointed out that transitions include a post-validation step that removes flags from bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/StarlarkTransition.java Line 353 in a9206eb
As far as I can see no such removal happens for |
One more comment: If the configuration at the top-level (command line) has all flags at their defaults, then transition A sets In that case you'd want both configurations to not have an bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java Line 482 in a9206eb
That's an inefficiency. But that's an existing inefficiency. @sdtwigg 's ongoing work should remove that too. |
…lculating ST-hash Previously, the hash calculation was skipping including StarlarkOptions that happened to be at their default values. This is wrong since those values may still be in "affected by Starlark transition" (because either the commandline set them and the Starlark transition reset them to their Starlark defaults thus still requiring a hash change OR the commandline did not set them but a series of Starlark transitions did an default->B->default anyways causing the Starlark option to still be 'stuck' in "affected by Starlark transition"). Resolves bazelbuild#14239 PiperOrigin-RevId: 408701552
Thanks for the explanation! I currently don't have transitions setting options back to their defaults, but I'm looking forward to @sdtwigg's optimizations in that area.
I got the impression that
True, but let’s adapt the example and imagine there are also other unrelated transitions in A-B-C so that they will have hashes. |
|
I see, thanks @gregestren! |
* Move transitionDirectoryNameFragment calculation to BuildConfigurationValue As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.) This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment. This fixes and subsumes #13464 and #13915, respectively. This is related to #14023 PiperOrigin-RevId: 407913175 * Properly account for StarlarkOptions at their default (=null) when calculating ST-hash Previously, the hash calculation was skipping including StarlarkOptions that happened to be at their default values. This is wrong since those values may still be in "affected by Starlark transition" (because either the commandline set them and the Starlark transition reset them to their Starlark defaults thus still requiring a hash change OR the commandline did not set them but a series of Starlark transitions did an default->B->default anyways causing the Starlark option to still be 'stuck' in "affected by Starlark transition"). Resolves #14239 PiperOrigin-RevId: 408701552 Co-authored-by: twigg <twigg@google.com>
…lculating ST-hash Previously, the hash calculation was skipping including StarlarkOptions that happened to be at their default values. This is wrong since those values may still be in "affected by Starlark transition" (because either the commandline set them and the Starlark transition reset them to their Starlark defaults thus still requiring a hash change OR the commandline did not set them but a series of Starlark transitions did an default->B->default anyways causing the Starlark option to still be 'stuck' in "affected by Starlark transition"). Resolves #14239 PiperOrigin-RevId: 408701552
Description of the problem / feature request:
Since 711c44e, transitions that lead to configs with differing
affected by starlark transition
value but otherwise identical configs lead to action conflicts.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
bazel build //...
with Bazel at 2255ce4, the parent of the breaking commit.bazel build //...
with Bazel at 711c44e.The config diff:
What operating system are you running Bazel on?
Linux
What's the output of
bazel info release
?development version
If
bazel info release
returns "development version" or "(@non-git)", tell us how you built Bazel.From 2255ce4 resp. 711c44e
What's the output of
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?Have you found anything relevant by searching the web?
Related to the discussion at #14023.
Any other information, logs, or outputs that you want to share?
@sdtwigg @gregestren
The text was updated successfully, but these errors were encountered: