50ms overhead between resolution time and wall time in `uv lock` #4860

konstin · 2024-07-07T18:37:32Z

When running uv lock on https://github.com/konstin/transformers/blob/9638c4defb18f5358c70f40dd00d65595657dd57/pyproject.toml, i get:

Resolved 291 packages in 65ms

with values in the 60ms-70ms range. But when measuring wall time, i get:

$ hyperfine "uv lock"
Benchmark 1: uv lock
  Time (mean ± σ):     111.1 ms ±   1.9 ms    [User: 112.7 ms, System: 110.1 ms]
  Range (min … max):   108.4 ms … 115.5 ms    26 runs

That's an ~50ms overhead!

With #4495 applies, the resolution is down to 12ms, but wall time is 60ms, still roughly a 50ms overhead. We should investigate and fix what's slowing us down here.

The text was updated successfully, but these errors were encountered:

charliermarsh · 2024-07-07T19:24:39Z

The gap I see here is also present in tracing. Like the last tracing message clocks in at 0.163996s (and the last user-facing message is Resolved 291 packages in 143ms), but time reports 0.205.

charliermarsh · 2024-07-07T19:27:21Z

Is it possible the branch you're testing on doesn't have: #4793

konstin · 2024-07-07T19:31:26Z

I'm seeing this with a release build from d787e69. With your numbers i get 0.163996s - 143ms = 21ms unaccounted between start and start of resolution and 0.205s - 143ms = 62ms in total, so this seems to be happening on mac too?

charliermarsh · 2024-07-07T19:33:13Z

@ibraheemdev - do you have any idea where that time could be going?

ChannyClaus · 2024-07-08T03:30:47Z

kinda got curious so dug around a bit - seems like

uv/crates/uv-requirements/src/upgrade.rs

Line 76 in ffcc052

Ok(encoded) => match toml::from_str::<Lock>(&encoded) {

is the culprit?

charliermarsh · 2024-07-08T19:03:22Z

That’s interesting… we should profile it.

konstin · 2024-07-09T11:22:29Z

I think it's the same problem, but uv lock -p 3.9 is slower than uv pip compile -p 3.9 --universal for transformers:

$ hyperfine "uv pip compile pyproject.toml --all-extras -p 3.9 -o a.txt" "uv pip compile pyproject.toml --all-extras -p 3.9 --universal -o b.txt" "uv lock -p 3.9"
Benchmark 1: uv pip compile pyproject.toml --all-extras -p 3.9 -o a.txt
  Time (mean ± σ):      63.8 ms ±   3.0 ms    [User: 58.8 ms, System: 92.1 ms]
  Range (min … max):    59.4 ms …  74.6 ms    44 runs
 
Benchmark 2: uv pip compile pyproject.toml --all-extras -p 3.9 --universal -o b.txt
  Time (mean ± σ):      72.7 ms ±   2.5 ms    [User: 64.5 ms, System: 96.8 ms]
  Range (min … max):    67.9 ms …  79.7 ms    37 runs
 
Benchmark 3: uv lock -p 3.9
  Time (mean ± σ):      89.4 ms ±   2.5 ms    [User: 86.5 ms, System: 89.7 ms]
  Range (min … max):    85.6 ms …  95.4 ms    32 runs
 
Summary
  uv pip compile pyproject.toml --all-extras -p 3.9 -o a.txt ran
    1.14 ± 0.07 times faster than uv pip compile pyproject.toml --all-extras -p 3.9 --universal -o b.txt
    1.40 ± 0.08 times faster than uv lock -p 3.9

ibraheemdev · 2024-07-09T17:32:25Z

It looks like there's ~20-30ms of overhead before starting resolution in read_lockfile, and ~20-30ms after in Lock::from_resolution_graph and Lock::to_toml + dropping. I expect there's some low hanging fruit there.

## Summary Avoid serializing and writing the lockfile if a cheap comparison shows that the contents have not changed. ## Test Plan Shaves ~10ms off of #4860 for me. ``` ➜ transformers hyperfine "../../uv/target/profiling/uv lock" "../../uv/target/profiling/baseline lock" --warmup 3 Benchmark 1: ../../uv/target/profiling/uv lock Time (mean ± σ): 130.5 ms ± 2.5 ms [User: 130.3 ms, System: 85.0 ms] Range (min … max): 126.8 ms … 136.9 ms 23 runs Benchmark 2: ../../uv/target/profiling/baseline lock Time (mean ± σ): 140.5 ms ± 5.0 ms [User: 142.8 ms, System: 85.5 ms] Range (min … max): 133.2 ms … 153.3 ms 21 runs Summary ../../uv/target/profiling/uv lock ran 1.08 ± 0.04 times faster than ../../uv/target/profiling/baseline lock ```

## Summary We currently store wheel URLs in an unparsed state because we don't have a stable parsed representation to use with rykv. Unfortunately this means we end up reparsing unnecessarily in a lot of places, especially when constructing a `Lock`. This PR adds a `UrlString` type that lets us avoid reparsing without losing the validity of the `Url`. ## Test Plan Shaves off another ~10 ms from #4860. ``` ➜ transformers hyperfine "../../uv/target/profiling/uv lock" "../../uv/target/profiling/baseline lock" --warmup 3 Benchmark 1: ../../uv/target/profiling/uv lock Time (mean ± σ): 120.9 ms ± 2.5 ms [User: 126.0 ms, System: 80.6 ms] Range (min … max): 116.8 ms … 125.7 ms 23 runs Benchmark 2: ../../uv/target/profiling/baseline lock Time (mean ± σ): 129.9 ms ± 4.2 ms [User: 127.1 ms, System: 86.1 ms] Range (min … max): 123.4 ms … 141.2 ms 23 runs Summary ../../uv/target/profiling/uv lock ran 1.07 ± 0.04 times faster than ../../uv/target/profiling/baseline lock ```

charliermarsh · 2024-07-21T22:24:24Z

#5235 cut the time by another ~13%:

❯ hyperfine "../target/release/main lock" "../target/release/uv lock" --warmup 50 --runs 100
Benchmark 1: ../target/release/main lock
  Time (mean ± σ):      29.0 ms ±   0.3 ms    [User: 23.6 ms, System: 4.8 ms]
  Range (min … max):    28.4 ms …  30.6 ms    100 runs

Benchmark 2: ../target/release/uv lock
  Time (mean ± σ):      25.7 ms ±   0.6 ms    [User: 20.3 ms, System: 4.7 ms]
  Range (min … max):    24.7 ms …  30.3 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '../target/release/uv lock' ran
    1.13 ± 0.03 times faster than '../target/release/main lock'

charliermarsh · 2024-07-29T01:15:05Z

Might be worth profiling this again at some point.

konstin added performance Potential performance improvement preview Experimental behavior labels Jul 7, 2024

ibraheemdev mentioned this issue Jul 9, 2024

Avoid serializing if lockfile does not change #4945

Merged

ibraheemdev mentioned this issue Jul 9, 2024

Avoid reparsing wheel URLs #4947

Merged

konstin mentioned this issue Jul 15, 2024

Add uv lock without resolution benchmark to codspeed #5064

Open

charliermarsh closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

50ms overhead between resolution time and wall time in `uv lock` #4860

50ms overhead between resolution time and wall time in `uv lock` #4860

konstin commented Jul 7, 2024

charliermarsh commented Jul 7, 2024

charliermarsh commented Jul 7, 2024

konstin commented Jul 7, 2024

charliermarsh commented Jul 7, 2024

ChannyClaus commented Jul 8, 2024

charliermarsh commented Jul 8, 2024

konstin commented Jul 9, 2024

ibraheemdev commented Jul 9, 2024

charliermarsh commented Jul 21, 2024

charliermarsh commented Jul 29, 2024

50ms overhead between resolution time and wall time in uv lock #4860

50ms overhead between resolution time and wall time in uv lock #4860

Comments

konstin commented Jul 7, 2024

charliermarsh commented Jul 7, 2024

charliermarsh commented Jul 7, 2024

konstin commented Jul 7, 2024

charliermarsh commented Jul 7, 2024

ChannyClaus commented Jul 8, 2024

charliermarsh commented Jul 8, 2024

konstin commented Jul 9, 2024

ibraheemdev commented Jul 9, 2024

charliermarsh commented Jul 21, 2024

charliermarsh commented Jul 29, 2024

50ms overhead between resolution time and wall time in `uv lock` #4860

50ms overhead between resolution time and wall time in `uv lock` #4860