Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to bytecode compile during installation #2086

Merged
merged 10 commits into from
Mar 5, 2024

Conversation

konstin
Copy link
Member

@konstin konstin commented Feb 29, 2024

Add a --compile option to pip install and pip sync.

I chose to implement this as a separate pass over the entire venv. If we wanted to compile during installation, we'd have to make sure that writing is exclusive, to avoid concurrent processes writing broken .pyc files. Additionally, this ensures that the entire site-packages are bytecode compiled, even if there are packages that aren't from this uv invocation. The disadvantage is that we do not update RECORD and rely on this comment from PEP 491:

Uninstallers should be smart enough to remove .pyc even if it is not mentioned in RECORD.

If this is a problem we can change it to run during installation and write RECORD entries.

Internally, this is implemented as an async work-stealing subprocess worker pool. The producer is a directory traversal over site-packages, sending each .py file to a bounded async FIFO queue/channel. Each worker has a long-running python process. It pops the queue to get a single path (or exists if the channel is closed), then sends it to stdin, waits until it's informed that the compilation is done through a line on stdout, and repeat. This is fast, e.g. installing jupyter plotly on Python 3.12 it processes 15876 files in 319ms with 32 threads (vs. 3.8s with a single core). The python processes internally calls compileall.compile_file, the same as pip.

Like pip, we ignore and silence all compilation errors (#1559). There is a 10s timeout to handle the case when the workers got stuck. For the reviewers, please check if i missed any spots where we could deadlock, this is the hardest part of this PR.

I've added uv-dev compile <dir> and uv-dev clear-compile <dir> commands, mainly for my own benchmarking. I don't want to expose them in uv, they almost certainly not the correct workflow and we don't want to support them.

Fixes #1788
Closes #1559
Closes #1928

@konstin konstin added enhancement New feature or improvement to existing functionality compatibility Compatibility with a specification or another tool labels Feb 29, 2024
@notatallshaw
Copy link
Contributor

notatallshaw commented Feb 29, 2024

The disadvantage is that we do not update RECORD and rely on this comment from PEP 491:

Uninstallers should be smart enough to remove .pyc even if it is not mentioned in RECORD.

If this is a problem we can change it to run during installation and write RECORD entries.

I would validate that pip successfully uninstalls the pyc files, because unless I am misunderstanding this issue it probably doesn't: pypa/pip#11835

@konstin
Copy link
Member Author

konstin commented Mar 1, 2024

I ran

cargo run venv --seed; cargo run pip install --compile tqdm
.\.venv\Scripts\pip uninstall -y tqdm

and the tqdm folder was fully removed.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is awesome! I gave some suggestions on the concurrency structure as I think it should be possible to simplify things.

crates/uv/src/commands/pip_sync.rs Outdated Show resolved Hide resolved
crates/uv/tests/pip_sync.rs Outdated Show resolved Hide resolved
crates/uv/tests/pip_sync.rs Show resolved Hide resolved
crates/uv/src/main.rs Show resolved Hide resolved
crates/uv-installer/src/compile.rs Outdated Show resolved Hide resolved
let workers =
std::thread::available_parallelism().map_err(CompileError::AvailableParallelism)?;
// 10 is an arbitrary number.
let (sender, receiver) = async_channel::bounded::<PathBuf>(workers.get() * 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't even bother with * 10.

You're probably find with a rendezvous channel (a capacity of 0). Rendezvous channels are especially nice because they manifest logic errors in the structure of your concurrency very quickly. (Although I do think there are occasions where using a capacity of 1 can lead to simplifications that you couldn't otherwise do with a capacity of 0.)

It's like how most times, we just use vec![] because you usually don't benefit from starting with some non-zero initial capacity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A larger queue is significantly faster. I tried size 1, the number of workers, the number of workers by 10 and size 1000.

hyperfine --warmup 1 --prepare "target/profiling/uv-dev clear-compile .venv" "target/profiling/uv-dev1 compile .venv" "target/profiling/uv-devworkers compile .venv" "target/profiling/uv-devworkers10 compile .venv" "target/profiling/uv-dev1000 compile .venv"
Benchmark 1: target/profiling/uv-dev1 compile .venv
  Time (mean ± σ):     359.8 ms ±  11.5 ms    [User: 6757.8 ms, System: 1849.4 ms]
  Range (min … max):   346.8 ms … 380.1 ms    10 runs
 
Benchmark 2: target/profiling/uv-devworkers compile .venv
  Time (mean ± σ):     352.6 ms ±   8.1 ms    [User: 6856.0 ms, System: 1709.1 ms]
  Range (min … max):   337.5 ms … 368.1 ms    10 runs
 
Benchmark 3: target/profiling/uv-devworkers10 compile .venv
  Time (mean ± σ):     331.4 ms ±   5.9 ms    [User: 6723.6 ms, System: 1761.8 ms]
  Range (min … max):   323.9 ms … 340.1 ms    10 runs
 
Benchmark 4: target/profiling/uv-dev1000 compile .venv
  Time (mean ± σ):     331.0 ms ±   6.0 ms    [User: 6735.2 ms, System: 1738.5 ms]
  Range (min … max):   323.7 ms … 339.6 ms    10 runs
 
Summary
  target/profiling/uv-dev1000 compile .venv ran
    1.00 ± 0.03 times faster than target/profiling/uv-devworkers10 compile .venv
    1.07 ± 0.03 times faster than target/profiling/uv-devworkers compile .venv
    1.09 ± 0.04 times faster than target/profiling/uv-dev1 compile .venv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite interesting. I stand corrected. It might be good to add a comment about it here (no need to include the full benchmark results though).

To me I think this implies that the directory traversal isn't feeding file paths fast enough into the workers, right? And I think that would be exacerbated with a high core count. It means there are likely idle workers in the low capacity case where as those workers get work more quickly in the high capacity case. Very interesting.

.arg(&pip_compileall_py)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::inherit())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that anything written to stderr by the interpreter will get fed through to uv's stderr right? If so, was that intentional? I don't think it's what I would expect. The messages could be quite confusing to users.

What I'd suggest is reading stderr in a separate Tokio task. That's I think the pretty standard solution for getting around the "so many things went wrong that the stderr pipe filled up and now the subprocess is blocked on writing to stderr and can't make progress" failure mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything written to stderr is a failure mode, is there a good way to say: When there's anything written to stderr at this checkout point, exit the worker with stderr as error message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do that in a task dedicated to reading stderr. If that task reads anything, then you should be able to devise a mechanism to communicate that back to the main worker thread, have it kill the sub-process and return whatever was on stderr.

I'd only do this though if you're sure that anything written to stderr should cause the entire enterprise to fail. e.g., What if stderr just contains a warning or something? (But maybe that can't happen here. IDK.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added handling to stderr that will

  • do nothing if there was no stderr,
  • debug print if there was stderr but exit status passed,
  • show the stderr in the error chain if there was stderr print and an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM!

crates/uv-installer/src/compile.rs Outdated Show resolved Hide resolved
crates/uv-installer/src/compile.rs Outdated Show resolved Hide resolved
}
}
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looking at this code holistically, I feel like we might be able to simplify the timeout/cancellation logic here.

The way I would go about this is to think about the workers themselves controlling how long they're willing to wait for the subprocess to complete its task. You pick a timeout at that granularity. Then right before to start communicating with the subprocess, spawk a task that will kill the subprocess if it hasn't completed after N seconds. You should be able to implement this with tokio::select!.

If you do that, then I think you should be able to get rid of the timeouts almost everywhere else. It absolves of you needing to worry about writing to stdin taking too long, because the worker will kill the subprocess if it's truly blocked. When the subprocess gets killed, your code that is blocked on stdin should become unblocked immediately and everything should gracefully terminate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified the timeouts in a different way: Only the caller has a timeout for workers (or the channel) getting stuck, one when sending (due to backpressure) and one when waiting for the workers to join. This way it doesn't matter whether the python process is broken, there's a bug in the worker implementation or the tokio threadpool or the channel broke due to a panic, each case trips that timeout.

Placing it at the send might be an unusual positioning, but i think it's the best place to catch stuck workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that either case is almost certainly a bug, either a real bug in uv or a quirk python interpreter we need to special case.

Add a `--compile` option to `pip install` and `pip sync`.

I chose to implement this as a separate over the entire venv. If we wanted to compile during installation, we'd have to make sure that writing is exclusive, to avoid concurrent processes writing broken `.pyc` files. Additionally, this ensures that the entire site-packages are bytecode compiled, even if there are packages that aren't from this `uv` invocation. The disadvantage is that we do not update RECORD and rely on this comment from [PEP 491](https://peps.python.org/pep-0491/):

> Uninstallers should be smart enough to remove .pyc even if it is not mentioned in RECORD.

If this is a problem we can change it to run during installation and write RECORD entries.

Internally, this is implemented as an async work-stealing subprocess worker pool. The producer is a directory traversal over site-packages, sending each `.py` file to a bounded async FIFO queue/channel. Each worker has a long-running python process. It pops the queue to get a single path (or exists if the channel is closed), then sends it to stdin, waits until it's informed that the compilation is done through a line on stdout, and repeat. This is fast, e.g. installing `jupyter plotly` on Python 3.12 it processes 15876 files in 319ms with 32 threads (vs. 3.8s with a single core). The python processes internally calls `compileall.compile_file`, the same as pip.

Like pip, we ignore and silence all compilation errors (#1559). There are 10s and 1s timeouts to ensure we don't get stuck when the python subprocess doesn't work properly or there was a panic breaking tokio. For the reviewers, please check if i missed any spots where we could deadlock, this is the hardest part of this PR.

I still have to check that windows works.

I don't think we want an option to compile the seed packages in ihe `venv` subcommand, do we?

Fixes #1788
Closes #1559
Closes #1928
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with the overall approach, but I defer to Andrew to review and approve the pipeline in compile.rs.

@charliermarsh
Copy link
Member

On second thought, I’m wondering if this should just be a stand-alone command. Since it runs over the whole env, there’s no reason it needs to be coupled to install / sync.

@konstin
Copy link
Member Author

konstin commented Mar 4, 2024

We can expose it as uv compileall (we can copy the dev command over), my only counterargument is not extending our external interface more. I'd like to keep the --compile flag either way to keep the install operation a single command.

Fwiw there's also python -m compileall, it does the same thing just slower:

$ hyperfine --prepare "uv venv && uv pip install jupyter plotly" "python -m compileall .venv/lib/python3.12/sit
e-packages/"
Benchmark 1: python -m compileall .venv/lib/python3.12/site-packages/
  Time (mean ± σ):      5.407 s ±  1.026 s    [User: 4.354 s, System: 0.427 s]
  Range (min … max):    4.668 s …  7.699 s    10 runs

Comment on lines +162 to +163
// Otherwise stdout is buffered and we'll wait forever for a response
.env("PYTHONUNBUFFERED", "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just flush on the Python end when you call print?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would also work

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this landing as-is, but I'm still overall skeptical of how the channel timeouts are being used here.

My understanding is that if compilation takes longer than the channel timeout overall (~10s), then the channel timeout could trip and thus cause the entire enterprise to fail even if it would have otherwise succeeded. Whether or not that's a real problem in practice is hard to say, but it feels non-ideal to be placing bounds on things that we don't really control. That is, the channel timeout is conflating the possibility of bugs (in both our code and perhaps in how the Python interpreter behaves) with the amount of time it takes for the user's resources to bytecode-compile a set of Python files.

I do think the better approach here is to deal with problems with byte-code compiling at the level of the subprocess. And specifically, only use timeouts when something you know could legitimately hang (like, say, killing a subprocess).

With that said, without user reports, it's hard to know whether the timeouts will be an issue in practice.

@konstin
Copy link
Member Author

konstin commented Mar 4, 2024

I do think the better approach here is to deal with problems with byte-code compiling at the level of the subprocess. And specifically, only use timeouts when something you know could legitimately hang (like, say, killing a subprocess).

The problem i'm mainly thinking about is when the python process is in some regard broken, e.g. image python is replaced by a bash script that does > /dev/null (this isn't the case, but i've seen pythons that were not pythons or behaved strangely for some other reason). I'm less worried about OS operation such as killing a subprocess.

The second problem i'm thinking about is e.g. a panic in one of the threads that makes tokio behave oddly or a concurrency bug on our end, even i don't have good insight about the failure modes here.

I feel this is a trade-off we have to make: We can put no timeout and expect the user to kill the process (or the CI to timeout) vs. we specify and unreasonably high limit after which we abort (similar to the timeout we put on http requests). If we want to put this in control of the user, i would remove the timeouts.

@BurntSushi
Copy link
Member

The problem i'm mainly thinking about is when the python process is in some regard broken, e.g. image python is replaced by a bash script that does > /dev/null (this isn't the case, but i've seen pythons that were not pythons or behaved strangely for some other reason). I'm less worried about OS operation such as killing a subprocess.

I think for cases like this, I'd be happier with a large timeout at a more granular level. e.g., a ~60 second timeout within the worker. So after 60 seconds of no output, the worker would kill the subprocess and report an error to the user. The 60 second wait isn't great, but the user will at least eventually get some kind of error message instead of a truly indefinite hang.

If this were a common failure mode then this wouldn't be great. But if a Python process is as badly broken as > /dev/null, I think I'm more-or-less okay with having to wait longer before saying something to the user.

And since the timeout is applied more granularly (at the level of a single subprocess), you could feasibly decrease it.

The second problem i'm thinking about is e.g. a panic in one of the threads that makes tokio behave oddly or a concurrency bug on our end, even i don't have good insight about the failure modes here.

I don't have as much experience with panics in Tokio, but I think it will catch them and the task will quit and you'll get a JoinError. But I defer to you on this one. You probably have more practical hands-on experience than I do here.

I feel this is a trade-off we have to make: We can put no timeout and expect the user to kill the process (or the CI to timeout) vs. we specify and unreasonably high limit after which we abort (similar to the timeout we put on http requests). If we want to put this in control of the user, i would remove the timeouts.

Yeah to be clear, I think it's fine to have timeouts somewhere. I'm mostly more uneasy with having them at a higher level. But it could all be fine. I just don't know.

@charliermarsh
Copy link
Member

This LGTM, though I'm also curious if we can move the timeout down a level?

@charliermarsh
Copy link
Member

I moved the temporary directory into the cache just to be safe. I'll merge this to keep it moving.

@charliermarsh charliermarsh enabled auto-merge (squash) March 5, 2024 03:32
@charliermarsh charliermarsh merged commit 2a53e78 into main Mar 5, 2024
7 checks passed
@charliermarsh charliermarsh deleted the konsti/compile-option branch March 5, 2024 03:35
@konstin
Copy link
Member Author

konstin commented Mar 5, 2024

I moved the temporary directory into the cache just to be safe.

What's the problem we're avoiding this way?

konstin added a commit that referenced this pull request Mar 5, 2024
Follow-up to #2086: Don't use timeouts for the entire workers, but only for the section that's about communicating with the (potentially broken) `python` subprocess. I've also raised the timeout to 60s.
konstin added a commit that referenced this pull request Mar 5, 2024
Follow-up to #2086: Don't use timeouts for the entire workers, but only
for the section that's about communicating with the (potentially broken)
`python` subprocess. I've also raised the timeout to 60s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Compatibility with a specification or another tool enhancement New feature or improvement to existing functionality
Projects
None yet
5 participants