Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizes Cargo's registry cache format for fewer files #6908

Open
alexcrichton opened this issue May 6, 2019 · 10 comments
Open

Optimizes Cargo's registry cache format for fewer files #6908

alexcrichton opened this issue May 6, 2019 · 10 comments
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts Performance Gotta go fast! S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.

Comments

@alexcrichton
Copy link
Member

alexcrichton commented May 6, 2019

First implemented in #6880 Cargo now has an on-disk cache for the registry index which avoids loading from git and parsing extraneous JSON which doesn't end up getting used.

This format isn't fantastic for Windows, however, since it has an extra file-per-crate in the index. @Eh2406 recommended a different implementation strategy which would reduce the number of files in play, and we should likely implement that!


Implementation history:

@ehuss ehuss added the Performance Gotta go fast! label May 8, 2019
@ehuss ehuss added the A-caching Area: caching of dependencies, repositories, and build artifacts label May 20, 2019
@Eh2406
Copy link
Contributor

Eh2406 commented Aug 29, 2019

A easier way to see how big a difference this makes may be to do a single file cache using sqlite or sled. Then when we know if it works we can have the discussion of if the dependency is worth it, or if we should implement our own format.

bors added a commit that referenced this issue Sep 3, 2019
minimal-copy `deserialize` for `InternedString`

I just learnt that `serde::Deserialize` for `Cow<'a, str>` allocates by default! Thus negating the intended benefit of ea957da, and this is in the hot loop for no-op builds #6908. The docs https://serde.rs/lifetimes.html#borrowing-data-in-a-derived-impl say you can fix this with a `#[serde(borrow)]`, but in practice this does not work on  `Option<Cow<'a, str>>`.  Some of these are just going to be turned into `InternedString`s, so we can tell serde to do that directly saving an allocation while we are at it!

So is this faster, or just reducing the number of `InternedString` <-> `&str` conversions?
I ran the benchmark script developed for #7168 (comment). Looks like no change for Cargo's lockfile and a ~7% improvement for the 2000 crate stress test.
@weihanglo
Copy link
Member

weihanglo commented May 30, 2021

I've beem experimenting some naive implementation for SQLite and sled but the result seems not better than the current in-house cache mechanism.

Both of the implementations store almost the same format, "package name -> content blob" key-value pairs, inside databases. The only slight difference is that I name the database files <cache-version>-<index-version>-<sha>.db and remove the prefix from each blob contents.

Though the benchmark result seems not that great, it probably has some room to improve from SQLite side. For sled I have no idea how to tune it. I would love to do more investigation if needed.

Any thought or direction are welcome!


Environment information:

  • macOS Big Sur 11.4
  • MacBook Pro (15-inch, 2017)
  • 2.9 GHz Quad-Core Intel Core i7
  • 500GB SSD

The patches are listed as below:

The benchmark are run against cargo project itself by hyperfine.

Command master SQLite sled
update 🥇295.0 ms ± 9.9 ms 307.7 ms ± 8.7 ms 339.4 ms ± 13.3 ms
update -p git2 🥇317.6 ms ± 12.7 ms 329.8 ms ± 9.2 ms 362.4 ms ± 17.6 ms
generate-lockfile 🥇304.6 ms ± 11.7 ms 320.5 ms ± 8.0 ms 348.5 ms ± 13.7 ms
build 🥇132.3 ms ± 3.4 ms 140.6 ms ± 3.1 ms 180.8 ms ± 9.1 ms
Raw benchmark data from hyperfine
hyperfine -w 5 -m 20 \
'cargo generate-lockfile' \
'cargo-seld generate-lockfile' \
'cargo-sqlite generate-lockfile'

Benchmark #1: cargo generate-lockfile
  Time (mean ± σ):     304.6 ms ±  11.7 ms    [User: 71.9 ms, System: 19.0 ms]
  Range (min … max):   279.2 ms … 321.5 ms    20 runs

Benchmark #2: cargo-seld generate-lockfile
  Time (mean ± σ):     348.5 ms ±  13.7 ms    [User: 88.5 ms, System: 30.9 ms]
  Range (min … max):   329.0 ms … 379.6 ms    20 runs

Benchmark #3: cargo-sqlite generate-lockfile
  Time (mean ± σ):     320.5 ms ±   8.0 ms    [User: 84.3 ms, System: 22.0 ms]
  Range (min … max):   303.1 ms … 330.7 ms    20 runs

Summary
  'cargo generate-lockfile' ran
    1.05 ± 0.05 times faster than 'cargo-sqlite generate-lockfile'
    1.14 ± 0.06 times faster than 'cargo-seld generate-lockfile'

---

hyperfine -w 5 -m 20 \
'cargo build' \
'cargo-seld build' \
'cargo-sqlite build'

Benchmark #1: cargo build
  Time (mean ± σ):     132.3 ms ±   3.4 ms    [User: 81.8 ms, System: 47.0 ms]
  Range (min … max):   127.2 ms … 143.0 ms    20 runs

Benchmark #2: cargo-seld build
  Time (mean ± σ):     180.8 ms ±   9.1 ms    [User: 101.0 ms, System: 61.5 ms]
  Range (min … max):   164.6 ms … 210.7 ms    20 runs

Benchmark #3: cargo-sqlite build
  Time (mean ± σ):     140.6 ms ±   3.1 ms    [User: 88.6 ms, System: 48.3 ms]
  Range (min … max):   134.7 ms … 145.2 ms    20 runs

Summary
  'cargo build' ran
    1.06 ± 0.04 times faster than 'cargo-sqlite build'
    1.37 ± 0.08 times faster than 'cargo-seld build'


---

hyperfine -w 5 -m 20 \
'cargo update' \
'cargo-seld update' \
'cargo-sqlite update'

Benchmark #1: cargo update
  Time (mean ± σ):     317.6 ms ±  12.7 ms    [User: 79.8 ms, System: 20.6 ms]
  Range (min … max):   295.6 ms … 343.7 ms    20 runs

Benchmark #2: cargo-seld update
  Time (mean ± σ):     362.4 ms ±  17.6 ms    [User: 92.2 ms, System: 32.1 ms]
  Range (min … max):   334.6 ms … 397.0 ms    20 runs

Benchmark #3: cargo-sqlite update
  Time (mean ± σ):     329.8 ms ±   9.2 ms    [User: 88.7 ms, System: 22.7 ms]
  Range (min … max):   314.9 ms … 343.0 ms    20 runs

Summary
  'cargo update' ran
    1.04 ± 0.05 times faster than 'cargo-sqlite update'
    1.14 ± 0.07 times faster than 'cargo-seld update'

---

hyperfine -w 5 -m 20 \
'cargo update -p git2' \
'cargo-seld update -p git2' \
'cargo-sqlite update -p git2'

Benchmark #1: cargo update -p git2
  Time (mean ± σ):     295.0 ms ±   9.9 ms    [User: 60.9 ms, System: 18.9 ms]
  Range (min … max):   268.8 ms … 313.6 ms    20 runs

Benchmark #2: cargo-seld update -p git2
  Time (mean ± σ):     339.4 ms ±  13.3 ms    [User: 76.4 ms, System: 30.9 ms]
  Range (min … max):   321.4 ms … 365.8 ms    20 runs

Benchmark #3: cargo-sqlite update -p git2
  Time (mean ± σ):     307.7 ms ±   8.7 ms    [User: 73.4 ms, System: 20.9 ms]
  Range (min … max):   292.3 ms … 320.0 ms    20 runs

Summary
  'cargo update -p git2' ran
    1.04 ± 0.05 times faster than 'cargo-sqlite update -p git2'
    1.15 ± 0.06 times faster than 'cargo-seld update -p git2'

@Eh2406
Copy link
Contributor

Eh2406 commented May 30, 2021

Thank you so much for doing this work!
This diff moves the existing files into one. If that does not pan out we need not look into a follow up. Like have indexes for "package name -> version -> json from index". But one thing at a time.
The big question missing is what OS and Hard drive where you using for the benchmarks? Windows on HDD, is likely to be where this is helpful.

@weihanglo
Copy link
Member

Sorry about missing hardware information. Comment updated. 😅

@weihanglo
Copy link
Member

weihanglo commented May 31, 2021

Benchmark results for large projects

tl;dr, no significant performance improvement for these three implementations. The SQLite version is almost at the same speed comparing to the in-house cache mechanism, whereas sled version is slightly slow but is negligible to me.

Environment information:

  • macOS Big Sur 11.4
  • MacBook Pro (15-inch, 2017)
  • 2.9 GHz Quad-Core Intel Core i7
  • 500GB SSD

The patches:

Space usage in registry/index/.cache after benchmark:

  • SQLite: 32MB + 25k (journal)
  • sled: 92MB
  • master: 33MB
Expand to see benchmark data
#!/usr/bin/env bash
pkgs=$(ls -d */)
benchmark_at="$(date +%s)"
for pkg in ${pkgs[@]}; do
  pkg="${pkg%/}"
  echo "${pkg}"
  pushd "${pkg}"
  offline="--offline"; 
  # diem and substrate have deps conflicts with `--offline`
  if [[ "$pkg" -eq "diem" ]] || [[ "$pkg" -eq "substrate" ]]; then 
    offline=
  fi
  fn="${pkg}-${benchmark_at}-result"
  hyperfine -L offline "${offline}" -w 2 \
    'cargo/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n master \
    'cargo+cache-sled/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sled \
    'cargo+cache-sqlite/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sqlite \
    --export-markdown   "../${fn}.md"
  popd
done

diem/diem@05bdd16

cargo generate-lockfile -Zno-index-update

Command Mean [ms] Min [ms] Max [ms] Relative
master 984.7 ± 124.3 929.1 1337.9 1.03 ± 0.13
sled 993.4 ± 25.4 953.8 1045.0 1.04 ± 0.03
sqlite 952.0 ± 19.5 924.7 993.7 🥇 1.00

mozilla/gecko-dev@5977b6f

cargo generate-lockfile -Zno-index-update --offline

Command Mean [s] Min [s] Max [s] Relative
master 13.479 ± 0.617 12.848 14.956 🥇 1.00
sled 13.747 ± 0.448 12.904 14.295 1.02 ± 0.06
sqlite 13.688 ± 0.793 12.959 15.175 1.02 ± 0.07

rust-lang/rust@59579907

cargo generate-lockfile -Zno-index-update --offline

Command Mean [s] Min [s] Max [s] Relative
master 2.720 ± 0.159 2.514 2.993 1.02 ± 0.12
sled 2.873 ± 0.277 2.510 3.441 1.08 ± 0.15
sqlite 2.668 ± 0.272 2.473 3.286 🥇 1.00

servo/servo@d1673446

cargo generate-lockfile -Zno-index-update --offline

Command Mean [s] Min [s] Max [s] Relative
master 4.814 ± 0.310 4.578 5.518 1.00 ± 0.08
sled 4.899 ± 0.218 4.685 5.285 1.02 ± 0.07
sqlite 4.812 ± 0.225 4.589 5.256 🥇 1.00

paritytech/substrate@be1b8ef0

cargo generate-lockfile -Zno-index-update

Command Mean [ms] Min [ms] Max [ms] Relative
master 529.9 ± 10.9 520.5 558.6 🥇 1.00
sled 588.1 ± 10.0 581.0 615.8 1.11 ± 0.03
sqlite 532.6 ± 5.2 527.8 546.2 1.00 ± 0.02

tikv/tikv@06c3e76e

cargo generate-lockfile -Zno-index-update --offline

Command Mean [s] Min [s] Max [s] Relative
master 3.418 ± 0.151 3.306 3.802 🥇 1.00
sled 3.495 ± 0.138 3.343 3.759 1.02 ± 0.06
sqlite 3.512 ± 0.198 3.309 3.907 1.03 ± 0.07

@weihanglo
Copy link
Member

I made some changes for SQLite according to suggestions on zulip:

Still, no significant difference between the two.

benchmark details
#!/usr/bin/env bash

pkgs=$(/bin/ls -d */)
benchmark_at="$(date +%s)"
for pkg in ${pkgs[@]}; do
  pkg="${pkg%/}"
  echo "${pkg}"
  pushd "${pkg}"
  offline="--offline"; 
  # diem and substrate have deps conflicts with `--offline`
  if [[ "$pkg" -eq "diem" ]] || [[ "$pkg" -eq "substrate" ]]; then 
    offline=
  fi
  fn="${pkg}-${benchmark_at}-result"
  hyperfine -L offline "${offline}" -w 2 \
    -p 'mkdir -p ~/.cargo/registry/index/git.luolix.top-1ecc6299db9ec823/.cache' \
    -c 'rm -rf ~/.cargo/registry/index/git.luolix.top-1ecc6299db9ec823/.cache' \
    'sqlite/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sqlite \
    'sqlite-wal/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sqlite-wal \
    --export-markdown   "../${fn}.md"
  popd
done

diem/diem@05bdd16

Command Mean [ms] Min [ms] Max [ms] Relative
SQLite 973.0 ± 151.3 900.9 1400.6 🥇 1.00
SQLite+WAL 1022.0 ± 181.8 906.9 1411.0 1.05 ± 0.25

mozilla/gecko-dev@5977b6f

Command Mean [s] Min [s] Max [s] Relative
SQLite 14.059 ± 0.447 13.529 15.134 1.00 ± 0.05
SQLite+WAL 13.996 ± 0.570 13.258 15.138 🥇 1.00

rust-lang/rust@5957990

Command Mean [s] Min [s] Max [s] Relative
SQLite 3.011 ± 0.351 2.568 3.750 1.00 ± 0.17
SQLite+WAL 3.006 ± 0.352 2.604 3.741 🥇 1.00

servo/servo@d167344

Command Mean [s] Min [s] Max [s] Relative
SQLite 4.683 ± 0.186 4.491 4.995 🥇 1.00
SQLite+WAL 4.811 ± 0.186 4.579 5.151 1.03 ± 0.06

paritytech/substrate@be1b8ef

Command Mean [ms] Min [ms] Max [ms] Relative
SQLite 497.6 ± 19.1 480.4 544.4 🥇 1.00
SQLite+WAL 519.7 ± 4.7 513.7 528.7 1.04 ± 0.04

tikv/tikv@06c3e76

Command Mean [s] Min [s] Max [s] Relative
SQLite 3.621 ± 0.766 3.238 5.777 1.01 ± 0.23
SQLite+WAL 3.596 ± 0.282 3.309 4.264 🥇 1.00

@weihanglo
Copy link
Member

weihanglo commented Jun 2, 2021

Benchmark on Windows

Done some benchmark on Windows but the machine lacks lots of dependencies, so only a portion of test cases are included. We can see that SQLite outperforms original per-file version by 1.27-1.40x, and faster than seld from 1.04-1.60x.

Great result so far!

expand the benchmark details

Environment information:

  • Windows 10 Home
  • Intel Core i7 6500U 2.50GHz
  • RAM 8GB
  • 512GB SATA 3.0 M.2 SSD

b1684e2 (cargo)

Command Mean [ms] Min [ms] Max [ms] Relative
cargo 466.3 ± 32.7 438.5 548.8 1.40 ± 0.12
sled 534.6 ± 33.7 494.5 598.7 1.60 ± 0.12
sqlite 334.1 ± 15.3 321.1 367.7 🥇 1.00

paritytech/substrate@4652f9e

Command Mean [s] Min [s] Max [s] Relative
cargo 2.513 ± 0.047 2.406 2.566 1.34 ± 0.04
sled 1.942 ± 0.089 1.867 2.176 1.04 ± 0.05
sqlite 1.870 ± 0.047 1.809 1.976 🥇 1.00

tikv/tikv@eee35b7

Command Mean [s] Min [s] Max [s] Relative
cargo 2.418 ± 0.094 2.352 2.674 1.27 ± 0.05
sled 2.072 ± 0.051 2.020 2.177 1.09 ± 0.03
sqlite 1.906 ± 0.034 1.846 1.948 🥇 1.00

@epage
Copy link
Contributor

epage commented Nov 1, 2023

#12634 is adding sqlite to cargo. With the improvements this made to Windows, @weihanglo should we move this forward? Would a needs-mentor tag make sense for this?

@weihanglo
Copy link
Member

This is include in the plan: https://hackmd.io/U_k79wk7SkCQ8_dJgIXwJg. And yes this is something likely can be done after #12634. I am assuming @ehuss already got something.

@weihanglo weihanglo added the S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing. label Nov 1, 2023
@weihanglo
Copy link
Member

#13584 is another attempt to integrate SQLite for index cache. It basically replicates the behavior of the current filesystem cache, which stores entire JSON blobs from the index. Not ideal and there are performance losses in cache writes. On the official SQLite website they have benchmarked external blobs (Cargo's current behvior) and blobs-all-in-SQLite. Unsurprisingly on Linux it is performant to manipulate file IO.

I haven't done any benchmark on Windows, as I have no machine at this moment, though I know Windows is the area that will potentially gain a lot benefits from this. However, given cache writes has a poor performance with SQLite. I wonder if the experiment worth putting more efforts.

Something we could do

  • Design a better, reasonable SQLite schema to speedup IO on Linux. It might end up becoming a serialize/deserialize battle bewteen serde_json (IndexSummary::parse) versus returning whatever stored in SQLite.
  • If it is provied that on Windows everything is good but on Unix it's not. Is it possible that only on Windows we ship SQLite-cache index cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts Performance Gotta go fast! S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.
Projects
None yet
Development

No branches or pull requests

5 participants