Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

50ms overhead between resolution time and wall time in uv lock #4860

Closed
konstin opened this issue Jul 7, 2024 · 10 comments
Closed

50ms overhead between resolution time and wall time in uv lock #4860

konstin opened this issue Jul 7, 2024 · 10 comments
Labels
performance Potential performance improvement preview Experimental behavior

Comments

@konstin
Copy link
Member

konstin commented Jul 7, 2024

When running uv lock on https://github.com/konstin/transformers/blob/9638c4defb18f5358c70f40dd00d65595657dd57/pyproject.toml, i get:

Resolved 291 packages in 65ms

with values in the 60ms-70ms range. But when measuring wall time, i get:

$ hyperfine "uv lock"
Benchmark 1: uv lock
  Time (mean ± σ):     111.1 ms ±   1.9 ms    [User: 112.7 ms, System: 110.1 ms]
  Range (min … max):   108.4 ms … 115.5 ms    26 runs

That's an ~50ms overhead!

With #4495 applies, the resolution is down to 12ms, but wall time is 60ms, still roughly a 50ms overhead. We should investigate and fix what's slowing us down here.

@konstin konstin added performance Potential performance improvement preview Experimental behavior labels Jul 7, 2024
@charliermarsh
Copy link
Member

The gap I see here is also present in tracing. Like the last tracing message clocks in at 0.163996s (and the last user-facing message is Resolved 291 packages in 143ms), but time reports 0.205.

@charliermarsh
Copy link
Member

Is it possible the branch you're testing on doesn't have: #4793

@konstin
Copy link
Member Author

konstin commented Jul 7, 2024

I'm seeing this with a release build from d787e69. With your numbers i get 0.163996s - 143ms = 21ms unaccounted between start and start of resolution and 0.205s - 143ms = 62ms in total, so this seems to be happening on mac too?

@charliermarsh
Copy link
Member

@ibraheemdev - do you have any idea where that time could be going?

@ChannyClaus
Copy link
Contributor

kinda got curious so dug around a bit - seems like

Ok(encoded) => match toml::from_str::<Lock>(&encoded) {
is the culprit?

@charliermarsh
Copy link
Member

That’s interesting… we should profile it.

@konstin
Copy link
Member Author

konstin commented Jul 9, 2024

I think it's the same problem, but uv lock -p 3.9 is slower than uv pip compile -p 3.9 --universal for transformers:

$ hyperfine "uv pip compile pyproject.toml --all-extras -p 3.9 -o a.txt" "uv pip compile pyproject.toml --all-extras -p 3.9 --universal -o b.txt" "uv lock -p 3.9"
Benchmark 1: uv pip compile pyproject.toml --all-extras -p 3.9 -o a.txt
  Time (mean ± σ):      63.8 ms ±   3.0 ms    [User: 58.8 ms, System: 92.1 ms]
  Range (min … max):    59.4 ms …  74.6 ms    44 runs
 
Benchmark 2: uv pip compile pyproject.toml --all-extras -p 3.9 --universal -o b.txt
  Time (mean ± σ):      72.7 ms ±   2.5 ms    [User: 64.5 ms, System: 96.8 ms]
  Range (min … max):    67.9 ms …  79.7 ms    37 runs
 
Benchmark 3: uv lock -p 3.9
  Time (mean ± σ):      89.4 ms ±   2.5 ms    [User: 86.5 ms, System: 89.7 ms]
  Range (min … max):    85.6 ms …  95.4 ms    32 runs
 
Summary
  uv pip compile pyproject.toml --all-extras -p 3.9 -o a.txt ran
    1.14 ± 0.07 times faster than uv pip compile pyproject.toml --all-extras -p 3.9 --universal -o b.txt
    1.40 ± 0.08 times faster than uv lock -p 3.9

@ibraheemdev
Copy link
Member

It looks like there's ~20-30ms of overhead before starting resolution in read_lockfile, and ~20-30ms after in Lock::from_resolution_graph and Lock::to_toml + dropping. I expect there's some low hanging fruit there.

ibraheemdev added a commit that referenced this issue Jul 9, 2024
## Summary

Avoid serializing and writing the lockfile if a cheap comparison shows
that the contents have not changed.

## Test Plan

Shaves ~10ms off of #4860 for me.

```
➜  transformers hyperfine "../../uv/target/profiling/uv lock" "../../uv/target/profiling/baseline lock" --warmup 3
Benchmark 1: ../../uv/target/profiling/uv lock
  Time (mean ± σ):     130.5 ms ±   2.5 ms    [User: 130.3 ms, System: 85.0 ms]
  Range (min … max):   126.8 ms … 136.9 ms    23 runs
 
Benchmark 2: ../../uv/target/profiling/baseline lock
  Time (mean ± σ):     140.5 ms ±   5.0 ms    [User: 142.8 ms, System: 85.5 ms]
  Range (min … max):   133.2 ms … 153.3 ms    21 runs
 
Summary
  ../../uv/target/profiling/uv lock ran
    1.08 ± 0.04 times faster than ../../uv/target/profiling/baseline lock
```
ibraheemdev added a commit that referenced this issue Jul 10, 2024
## Summary

We currently store wheel URLs in an unparsed state because we don't have
a stable parsed representation to use with rykv. Unfortunately this
means we end up reparsing unnecessarily in a lot of places, especially
when constructing a `Lock`. This PR adds a `UrlString` type that lets us
avoid reparsing without losing the validity of the `Url`.

## Test Plan

Shaves off another ~10 ms from
#4860.

```
➜  transformers hyperfine "../../uv/target/profiling/uv lock" "../../uv/target/profiling/baseline lock" --warmup 3
Benchmark 1: ../../uv/target/profiling/uv lock
  Time (mean ± σ):     120.9 ms ±   2.5 ms    [User: 126.0 ms, System: 80.6 ms]
  Range (min … max):   116.8 ms … 125.7 ms    23 runs
 
Benchmark 2: ../../uv/target/profiling/baseline lock
  Time (mean ± σ):     129.9 ms ±   4.2 ms    [User: 127.1 ms, System: 86.1 ms]
  Range (min … max):   123.4 ms … 141.2 ms    23 runs

Summary
  ../../uv/target/profiling/uv lock ran
    1.07 ± 0.04 times faster than ../../uv/target/profiling/baseline lock
```
@charliermarsh
Copy link
Member

#5235 cut the time by another ~13%:

❯ hyperfine "../target/release/main lock" "../target/release/uv lock" --warmup 50 --runs 100
Benchmark 1: ../target/release/main lock
  Time (mean ± σ):      29.0 ms ±   0.3 ms    [User: 23.6 ms, System: 4.8 ms]
  Range (min … max):    28.4 ms …  30.6 ms    100 runs

Benchmark 2: ../target/release/uv lock
  Time (mean ± σ):      25.7 ms ±   0.6 ms    [User: 20.3 ms, System: 4.7 ms]
  Range (min … max):    24.7 ms …  30.3 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '../target/release/uv lock' ran
    1.13 ± 0.03 times faster than '../target/release/main lock'

@charliermarsh
Copy link
Member

Might be worth profiling this again at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement preview Experimental behavior
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants