-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[7.2.1] Windows/bzlmod/rules_rust: Failed to clean up module context directory: C:/tmp/modextwd/rules_rust~~crate (Directory not empty) #22710
Comments
OK so this is exactly the same bug as #22688 except that we don't crash anymore, right? It looks like something is holding the extension working directory hostage (par for the course on Windows)... I'm not sure what Bazel could do to recover in this case; I don't think we're supposed to wait forever until the ... hostage-holder (?) lets go. |
Yes, exactly, and at least as a non-bazel-dev, I didn't know which directory we were talking about in #22688, but now the exception includes a path.
Probably not, no. Would there be a reasonable spot somewhere in your code where you could show at least one candidate (or up to X candidates) of files inside that directory that prevent deletion? Why is this directory cleaned up at all, in the middle of a build?
in our build log, which seems to indicate that bazel tries to delete the directory while it's still executing other build targets (I doubt that >100 actions would finish in one second, without any log output). |
Right. We sometimes need to restart a module extension's execution (same as the repo restart mechanism described here: https://bazel.build/extending/repo#restarting_the_implementation_function). Since 7.1.0 we've introduced the flag
Probably we can just |
Let me try to reproduce the problem on my local Windows machine first, then I can inspect the directory. |
@Wyverald okay it'd be great to get some support here, I've debugged as far as I can get. I was able to isolate the code in our public repository, so I can share it with you. This actions workflow (on that branch of the repo) builds a single bazel target, and in about 1/20-3/20 times fails with an error. Example failure here (but I'm not sure if you can see the logs without extra permissions). I also stuck an I also observed that fetching this module extension is really slow, but I think most of that is the |
I tried to upload the whole bazel output base, but that was too many files for Also, as the problem here seem to be restarts of the module extension: Is there any way to figure out what's causing those, and then preload the reasons for those restarts? Even in a hacky way where we patch |
From what I can see, restarts in module extensions can't be prevented at all at the moment? bazelbuild/rules_rust#2691 reduces the number of expensive restarts on |
Filed #22729 |
This PR reduces the number of expensive module extension implementation restarts in combination with multiple `from_cargo` configurations. As each configuration processing calls `module_ctx.path` (which can and will cause restarts), we executed _generate_hub_and_spokes (i.e. `cargo-bazel`) _a lot_ and then threw that away on the next restart. Triggering `module_ctx.path` early is therefore a significant performance optimization. On our repository, this gives us a 10sec speedup on module extension processing (local M1 mac, nothing happening in parallel), see our [MODULE.bazel](https://github.com/github/codeql/blob/main/MODULE.bazel) if you're interested in what we're doing. This excessive restarting also exposed an upstream bazel bug on Windows 2019, where bazel spuriously fails to clean up the working directory (c.f. bazelbuild/bazel#22710). Co-authored-by: Daniel Wagner-Hall <dwagnerhall@apple.com>
This PR reduces the number of expensive module extension implementation restarts in combination with multiple `from_cargo` configurations. As each configuration processing calls `module_ctx.path` (which can and will cause restarts), we executed _generate_hub_and_spokes (i.e. `cargo-bazel`) _a lot_ and then threw that away on the next restart. Triggering `module_ctx.path` early is therefore a significant performance optimization. On our repository, this gives us a 10sec speedup on module extension processing (local M1 mac, nothing happening in parallel), see our [MODULE.bazel](https://github.com/github/codeql/blob/main/MODULE.bazel) if you're interested in what we're doing. This excessive restarting also exposed an upstream bazel bug on Windows 2019, where bazel spuriously fails to clean up the working directory (c.f. bazelbuild/bazel#22710). Co-authored-by: Daniel Wagner-Hall <dwagnerhall@apple.com>
Let's close this as won't fix: I have a workaround for So the impact of working on this issue would be really low. |
Description of the bug:
This bug continues the discussion from #22688, and the fruitful conversation we had in the associated PR: #22689 (comment)
Context: Without that PR, my team was observing bazel crashing spuriously on Windows after converting our
rules_rust
usage from WORKSPACE files to bzlmod.Thanks to bazelisk (it's amazing!), we were able to pull in a pre-release build of bazel's 7.2.1 branch.
Instead of the aforementioned crash we're now getting the (occasional) error
during our build.
Is this a bazel bug or a
rules_rust
bug?Which category does this issue belong to?
External Dependency
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I'm not sure. The relevant build target that pulls
rules_rust
into the target '@@codeql~//ruby:ruby-generic-zip' is defined here, but that repository is built in the context of an internal repository, and there's a few dependencies to the target that aren't open-sourced, so the repo doesn't build without modifications standalone.If there's a need for a reproducer, then I can try hacking/reducing our internal dependencies in a branch and writing an Actions workflow that runs bazel in a loop until it errors out with this.
Which operating system are you running Bazel on?
Windows Server 2019 (GitHub actions)
What is the output of
bazel info release
?development version
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.Bazelisk to pull in 7628649
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
We're using
rules_rust
crate_universe
mechanism to pull in external dependencies fromCargo.toml
into bazel. As might be more uncommon, we do have 3 dependencies we pull in as git repositories, instead of as regular downloads from crate.io, which takes a longer time (and might create more files on disk/more IO).The text was updated successfully, but these errors were encountered: