-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock bootstrap (x.py) build directory #88310
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @jyn514 (or someone else) soon. Please see the contribution instructions for more information. |
Happy to be responsible for this, but @Mark-Simulacrum will want to take a look too I'm sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that this doesn't lock in bootstrap.py, but maybe that's ok - certainly I haven't seen any bugs reported about it.
// terminal 1
$ x check
downloading https://ci-artifacts.rust-lang.org/rustc-builds/db002a06ae9154a35d410550bc5132df883d7baa/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
############################################################################### 100.0%
extracting /home/joshua/src/rust/rust/build/cache/llvm-db002a06ae9154a35d410550bc5132df883d7baa-False/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
// terminal 2
$ x build
Updating only changed submodules
Submodules updated in 0.02 seconds
downloading https://ci-artifacts.rust-lang.org/rustc-builds/db002a06ae9154a35d410550bc5132df883d7baa/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
### 4.9%^C
I don't feel qualified to review the flock code, but I also think I may not have to - I left a comment below.
I'm very happy with how this fixes the parallel builds problem :) thank you for tackling it!
src/bootstrap/flock.rs
Outdated
@@ -0,0 +1,206 @@ | |||
//! Handles creating a file lock to avoid race conditions[1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now the third flock implementation in-tree, there's also one for rustdoc ... maybe we should separate this out into a tiny crate or something? It would be hard to get cargo to use it since it's a submodule, but we could at least share it with rustdoc.
cc @rust-lang/cargo, do you have opinions on this? Would you be willing to separate the flock implementation into a different crate, and maybe put it on rust-lang/flock? I don't think there's any need to publish to crates.io but it would be nice to share at least between different tools.
cc @kinnison @rbtcollins, I'm sure rustup has its own implementation of flock too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust-lang/rustup#988 (comment) and the top level summary are probably relevant.
tl;dr: don't want the cargo implementation, it is buggy; but also haven't gotten around to fixing the issue in rustup, which is complected with rust-lang/rustup#2441 making this a little gnarly to think through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. x.py itself probably doesn't need an airtight flock implementation, I wouldn't expect people to be using NFS for builds. I totally get that rustup does have to deal with those scenarios though.
Does rustup's flock implementation have any downsides compared to cargo's? Could we start from that as the shared codebase instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After working a bit on standalone flock implementation for a bit, I wonder why we don't just use fcntl(2)
? There is a issue with the lock being able to be acquired by two threads of the same process, but we don't seem to be doing that, and it works over nfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a issue with the lock being able to be acquired by two threads of the same process
That would be an issue for the flock implementation in rustc itself though as multiple rustc sessions can run inside the same process when using custom drivers. For example rls used to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be an issue for the flock implementation in rustc itself though as multiple rustc sessions can run inside the same process when using custom drivers. For example rls used to do this.
We could avoid that with a mutex, right? So any thread has to unlock a mutex just to call flock? Then you get both per-thread and per-process synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly that flock will always succeed if a file has already been locked elsewhere inside the same process, that would require serializing all rustc sessions such that each session completely finishes until the next session locks any file, even if it isn't the same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have one for rustup yet; we just have quite clear understanding of the problem space we're dealing with :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also fd-lock, which worked relatively well for me in the past (admittedly it was a small project). https://crates.io/crates/fd-lock/
That said, perhaps you need more control than it gives.
So... this might be a little "out there", but it feels simpler than the solution this suggests and avoids duplicating code from Cargo. Both Python 2.7.x and Python 3 natively have sqlite built-in, as far as I can tell, via the sqlite3 module. That means that we could likely implement this by "just" leveraging sqlite transactions to block. This is sort of overkill, but I'd expect it to work pretty well across any platform or filesystem, since sqlite itself is pretty battle tested. It's also true that the python script currently manages a bit of state via pretty adhoc files (e.g., for the download stamps). I wouldn't change those today, there's not much point, but in the future it might make sense to leverage a more structured format that sqlite trivially gives us for this storage and without creating a bunch of files all over the place. I'm curious to hear what folks think of this idea. I haven't actually tried a sample implementation, either, but I'd naively expect it to work out to at most a couple dozen lines of Python -- not too complex. Since sqlite comes pre-compiled for us through python, this seems a nice solution that avoids some of the hassle we'd otherwise run into. If and/or when we migrate away from the python wrapper, we can explore alternatives (such as flock proposed by this PR), but for now this would seem better. It also avoids any parallelism issues in the python script's downloads and such, which also seems good -- I'm not sold those are bug free today. I do have mixed feelings about this so I could be convinced the impl in this PR is something we should just go with, but if the sqlite-based solution actually works and isn't too hard, then it seems preferable to me personally on the grounds of not having to maintain pretty low-level platform-specific code in bootstrap -- which is a place I'd really prefer to avoid that. |
Hi @worldeva - do you know when you'll get a chance to look at this again? |
a458940
to
94a3a8b
Compare
This comment has been minimized.
This comment has been minimized.
94a3a8b
to
b229c54
Compare
Sorry for the long delay for the tiny commit; life came up >,< |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me with the tidy lint fixed :)
src/bootstrap/bootstrap.py
Outdated
|
||
from time import time | ||
|
||
def acquire_lock(): | ||
try: | ||
con = sqlite3.Connection("xlock.db", timeout=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this creating the database in the working directory? I think we probably want it in build/ -- build.build_dir probably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to creating it in build.rust_dir
: The lock locks a lot more than just the stuff in build
(submodules, etc), and the folder may not exist at the time of lock creation.
Mostly just some documentation, but I expect we can move ahead once that's added. It'd be good to do some spot checking locally to confirm this mostly works, and make a note in the documentation (and PR description) the location of the database file, so if there's some bug with this it's easy to delete it. |
…Simulacrum Lock bootstrap (x.py) build directory Closes rust-lang#76661, closes rust-lang#80849, `x.py` creates a lock file at `project_root/lock.db` r? ``@jyn514`` , because he was one that told me about this~
@bors r- #91664 (comment) |
0a84930
to
c22cdbf
Compare
Should be fixed!~ |
src/bootstrap/bootstrap.py
Outdated
pass | ||
return curs | ||
except ImportError: | ||
print("warning: sqlite3 not available, skipping lock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth asking users to file an issue if they see this (on rust-lang/rust) and making it clear that the lock isn't necessary for correctness of the results (something like "build directory lock", maybe)?
2742b0d
to
f25e4bf
Compare
Done- |
Prevent spurious build failures and other bugs caused by parallel runs of x.py. We back the lock with sqlite, so that we have a cross-platform locking strategy, and which can be done nearly first in the build process (from Python), which helps move the lock as early as possible.
f25e4bf
to
1326bd6
Compare
Slightly adjusted wording, but seems good to me otherwise. Thanks! @bors r+ |
📌 Commit 1326bd6 has been approved by |
…askrgr Rollup of 7 pull requests Successful merges: - rust-lang#88310 (Lock bootstrap (x.py) build directory) - rust-lang#92097 (Implement split_at_spare_mut without Deref to a slice so that the spare slice is valid) - rust-lang#92412 (Fix double space in pretty printed TryBlock) - rust-lang#92420 (Fix whitespace in pretty printed PatKind::Range) - rust-lang#92457 (Sync rustc_codegen_gcc) - rust-lang#92460 ([rustc_builtin_macros] add indices to format_foreign::printf::Substitution::Escape) - rust-lang#92469 (Make tidy check for magic numbers that spell things) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Closes #76661, closes #80849,
x.py
creates a lock file atproject_root/lock.db
r? @jyn514 , because he was one that told me about this~