Optimizes Cargo's registry cache format for fewer files #6908

alexcrichton · 2019-05-06T15:41:13Z

First implemented in #6880 Cargo now has an on-disk cache for the registry index which avoids loading from git and parsing extraneous JSON which doesn't end up getting used.

This format isn't fantastic for Windows, however, since it has an extra file-per-crate in the index. @Eh2406 recommended a different implementation strategy which would reduce the number of files in play, and we should likely implement that!

Implementation history:

refactor: abstract std::fs away from on-disk index cache #13515

The text was updated successfully, but these errors were encountered:

Eh2406 · 2019-08-29T15:10:44Z

A easier way to see how big a difference this makes may be to do a single file cache using sqlite or sled. Then when we know if it works we can have the discussion of if the dependency is worth it, or if we should implement our own format.

minimal-copy `deserialize` for `InternedString` I just learnt that `serde::Deserialize` for `Cow<'a, str>` allocates by default! Thus negating the intended benefit of ea957da, and this is in the hot loop for no-op builds #6908. The docs https://serde.rs/lifetimes.html#borrowing-data-in-a-derived-impl say you can fix this with a `#[serde(borrow)]`, but in practice this does not work on `Option<Cow<'a, str>>`. Some of these are just going to be turned into `InternedString`s, so we can tell serde to do that directly saving an allocation while we are at it! So is this faster, or just reducing the number of `InternedString` <-> `&str` conversions? I ran the benchmark script developed for #7168 (comment). Looks like no change for Cargo's lockfile and a ~7% improvement for the 2000 crate stress test.

weihanglo · 2021-05-30T04:59:29Z

I've beem experimenting some naive implementation for SQLite and sled but the result seems not better than the current in-house cache mechanism.

Both of the implementations store almost the same format, "package name -> content blob" key-value pairs, inside databases. The only slight difference is that I name the database files <cache-version>-<index-version>-<sha>.db and remove the prefix from each blob contents.

Though the benchmark result seems not that great, it probably has some room to improve from SQLite side. For sled I have no idea how to tune it. I would love to do more investigation if needed.

Any thought or direction are welcome!

Environment information:

macOS Big Sur 11.4
MacBook Pro (15-inch, 2017)
2.9 GHz Quad-Core Intel Core i7
500GB SSD

The patches are listed as below:

The benchmark are run against cargo project itself by hyperfine.

Command	master	SQLite	sled
update	🥇295.0 ms ± 9.9 ms	307.7 ms ± 8.7 ms	339.4 ms ± 13.3 ms
update -p git2	🥇317.6 ms ± 12.7 ms	329.8 ms ± 9.2 ms	362.4 ms ± 17.6 ms
generate-lockfile	🥇304.6 ms ± 11.7 ms	320.5 ms ± 8.0 ms	348.5 ms ± 13.7 ms
build	🥇132.3 ms ± 3.4 ms	140.6 ms ± 3.1 ms	180.8 ms ± 9.1 ms

Raw benchmark data from hyperfine

hyperfine -w 5 -m 20 \
'cargo generate-lockfile' \
'cargo-seld generate-lockfile' \
'cargo-sqlite generate-lockfile'

Benchmark #1: cargo generate-lockfile
  Time (mean ± σ):     304.6 ms ±  11.7 ms    [User: 71.9 ms, System: 19.0 ms]
  Range (min … max):   279.2 ms … 321.5 ms    20 runs

Benchmark #2: cargo-seld generate-lockfile
  Time (mean ± σ):     348.5 ms ±  13.7 ms    [User: 88.5 ms, System: 30.9 ms]
  Range (min … max):   329.0 ms … 379.6 ms    20 runs

Benchmark #3: cargo-sqlite generate-lockfile
  Time (mean ± σ):     320.5 ms ±   8.0 ms    [User: 84.3 ms, System: 22.0 ms]
  Range (min … max):   303.1 ms … 330.7 ms    20 runs

Summary
  'cargo generate-lockfile' ran
    1.05 ± 0.05 times faster than 'cargo-sqlite generate-lockfile'
    1.14 ± 0.06 times faster than 'cargo-seld generate-lockfile'

---

hyperfine -w 5 -m 20 \
'cargo build' \
'cargo-seld build' \
'cargo-sqlite build'

Benchmark #1: cargo build
  Time (mean ± σ):     132.3 ms ±   3.4 ms    [User: 81.8 ms, System: 47.0 ms]
  Range (min … max):   127.2 ms … 143.0 ms    20 runs

Benchmark #2: cargo-seld build
  Time (mean ± σ):     180.8 ms ±   9.1 ms    [User: 101.0 ms, System: 61.5 ms]
  Range (min … max):   164.6 ms … 210.7 ms    20 runs

Benchmark #3: cargo-sqlite build
  Time (mean ± σ):     140.6 ms ±   3.1 ms    [User: 88.6 ms, System: 48.3 ms]
  Range (min … max):   134.7 ms … 145.2 ms    20 runs

Summary
  'cargo build' ran
    1.06 ± 0.04 times faster than 'cargo-sqlite build'
    1.37 ± 0.08 times faster than 'cargo-seld build'


---

hyperfine -w 5 -m 20 \
'cargo update' \
'cargo-seld update' \
'cargo-sqlite update'

Benchmark #1: cargo update
  Time (mean ± σ):     317.6 ms ±  12.7 ms    [User: 79.8 ms, System: 20.6 ms]
  Range (min … max):   295.6 ms … 343.7 ms    20 runs

Benchmark #2: cargo-seld update
  Time (mean ± σ):     362.4 ms ±  17.6 ms    [User: 92.2 ms, System: 32.1 ms]
  Range (min … max):   334.6 ms … 397.0 ms    20 runs

Benchmark #3: cargo-sqlite update
  Time (mean ± σ):     329.8 ms ±   9.2 ms    [User: 88.7 ms, System: 22.7 ms]
  Range (min … max):   314.9 ms … 343.0 ms    20 runs

Summary
  'cargo update' ran
    1.04 ± 0.05 times faster than 'cargo-sqlite update'
    1.14 ± 0.07 times faster than 'cargo-seld update'

---

hyperfine -w 5 -m 20 \
'cargo update -p git2' \
'cargo-seld update -p git2' \
'cargo-sqlite update -p git2'

Benchmark #1: cargo update -p git2
  Time (mean ± σ):     295.0 ms ±   9.9 ms    [User: 60.9 ms, System: 18.9 ms]
  Range (min … max):   268.8 ms … 313.6 ms    20 runs

Benchmark #2: cargo-seld update -p git2
  Time (mean ± σ):     339.4 ms ±  13.3 ms    [User: 76.4 ms, System: 30.9 ms]
  Range (min … max):   321.4 ms … 365.8 ms    20 runs

Benchmark #3: cargo-sqlite update -p git2
  Time (mean ± σ):     307.7 ms ±   8.7 ms    [User: 73.4 ms, System: 20.9 ms]
  Range (min … max):   292.3 ms … 320.0 ms    20 runs

Summary
  'cargo update -p git2' ran
    1.04 ± 0.05 times faster than 'cargo-sqlite update -p git2'
    1.15 ± 0.06 times faster than 'cargo-seld update -p git2'

Eh2406 · 2021-05-30T13:30:00Z

Thank you so much for doing this work!
This diff moves the existing files into one. If that does not pan out we need not look into a follow up. Like have indexes for "package name -> version -> json from index". But one thing at a time.
The big question missing is what OS and Hard drive where you using for the benchmarks? Windows on HDD, is likely to be where this is helpful.

weihanglo · 2021-05-30T13:35:05Z

Sorry about missing hardware information. Comment updated. 😅

weihanglo · 2021-05-31T01:15:28Z

Benchmark results for large projects

tl;dr, no significant performance improvement for these three implementations. The SQLite version is almost at the same speed comparing to the in-house cache mechanism, whereas sled version is slightly slow but is negligible to me.

Environment information:

macOS Big Sur 11.4
MacBook Pro (15-inch, 2017)
2.9 GHz Quad-Core Intel Core i7
500GB SSD

The patches:

Space usage in registry/index/.cache after benchmark:

SQLite: 32MB + 25k (journal)
sled: 92MB
master: 33MB

Expand to see benchmark data

#!/usr/bin/env bash
pkgs=$(ls -d */)
benchmark_at="$(date +%s)"
for pkg in ${pkgs[@]}; do
  pkg="${pkg%/}"
  echo "${pkg}"
  pushd "${pkg}"
  offline="--offline"; 
  # diem and substrate have deps conflicts with `--offline`
  if [[ "$pkg" -eq "diem" ]] || [[ "$pkg" -eq "substrate" ]]; then 
    offline=
  fi
  fn="${pkg}-${benchmark_at}-result"
  hyperfine -L offline "${offline}" -w 2 \
    'cargo/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n master \
    'cargo+cache-sled/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sled \
    'cargo+cache-sqlite/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sqlite \
    --export-markdown   "../${fn}.md"
  popd
done

diem/diem@`05bdd16`

cargo generate-lockfile -Zno-index-update

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`master`	984.7 ± 124.3	929.1	1337.9	1.03 ± 0.13
`sled`	993.4 ± 25.4	953.8	1045.0	1.04 ± 0.03
`sqlite`	952.0 ± 19.5	924.7	993.7	🥇 1.00

mozilla/gecko-dev@`5977b6f`

cargo generate-lockfile -Zno-index-update --offline

Command	Mean [s]	Min [s]	Max [s]	Relative
`master`	13.479 ± 0.617	12.848	14.956	🥇 1.00
`sled`	13.747 ± 0.448	12.904	14.295	1.02 ± 0.06
`sqlite`	13.688 ± 0.793	12.959	15.175	1.02 ± 0.07

rust-lang/rust@`59579907`

cargo generate-lockfile -Zno-index-update --offline

Command	Mean [s]	Min [s]	Max [s]	Relative
`master`	2.720 ± 0.159	2.514	2.993	1.02 ± 0.12
`sled`	2.873 ± 0.277	2.510	3.441	1.08 ± 0.15
`sqlite`	2.668 ± 0.272	2.473	3.286	🥇 1.00

servo/servo@`d1673446`

cargo generate-lockfile -Zno-index-update --offline

Command	Mean [s]	Min [s]	Max [s]	Relative
`master`	4.814 ± 0.310	4.578	5.518	1.00 ± 0.08
`sled`	4.899 ± 0.218	4.685	5.285	1.02 ± 0.07
`sqlite`	4.812 ± 0.225	4.589	5.256	🥇 1.00

paritytech/substrate@`be1b8ef0`

cargo generate-lockfile -Zno-index-update

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`master`	529.9 ± 10.9	520.5	558.6	🥇 1.00
`sled`	588.1 ± 10.0	581.0	615.8	1.11 ± 0.03
`sqlite`	532.6 ± 5.2	527.8	546.2	1.00 ± 0.02

tikv/tikv@`06c3e76e`

cargo generate-lockfile -Zno-index-update --offline

Command	Mean [s]	Min [s]	Max [s]	Relative
`master`	3.418 ± 0.151	3.306	3.802	🥇 1.00
`sled`	3.495 ± 0.138	3.343	3.759	1.02 ± 0.06
`sqlite`	3.512 ± 0.198	3.309	3.907	1.03 ± 0.07

weihanglo · 2021-05-31T03:11:57Z

I made some changes for SQLite according to suggestions on zulip:

Enable WAL
Use prepare_cached
Create table WIHTOUT ROWID

Still, no significant difference between the two.

SQLite: weihanglo/cargo@4e27712
SQLite+WAL: weihanglo/cargo@5093825

benchmark details

#!/usr/bin/env bash

pkgs=$(/bin/ls -d */)
benchmark_at="$(date +%s)"
for pkg in ${pkgs[@]}; do
  pkg="${pkg%/}"
  echo "${pkg}"
  pushd "${pkg}"
  offline="--offline"; 
  # diem and substrate have deps conflicts with `--offline`
  if [[ "$pkg" -eq "diem" ]] || [[ "$pkg" -eq "substrate" ]]; then 
    offline=
  fi
  fn="${pkg}-${benchmark_at}-result"
  hyperfine -L offline "${offline}" -w 2 \
    -p 'mkdir -p ~/.cargo/registry/index/git.luolix.top-1ecc6299db9ec823/.cache' \
    -c 'rm -rf ~/.cargo/registry/index/git.luolix.top-1ecc6299db9ec823/.cache' \
    'sqlite/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sqlite \
    'sqlite-wal/target/release/cargo generate-lockfile -Zno-index-update {offline}' -n sqlite-wal \
    --export-markdown   "../${fn}.md"
  popd
done

diem/diem@`05bdd16`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
SQLite	973.0 ± 151.3	900.9	1400.6	🥇 1.00
SQLite+WAL	1022.0 ± 181.8	906.9	1411.0	1.05 ± 0.25

mozilla/gecko-dev@`5977b6f`

Command	Mean [s]	Min [s]	Max [s]	Relative
SQLite	14.059 ± 0.447	13.529	15.134	1.00 ± 0.05
SQLite+WAL	13.996 ± 0.570	13.258	15.138	🥇 1.00

rust-lang/rust@`5957990`

Command	Mean [s]	Min [s]	Max [s]	Relative
SQLite	3.011 ± 0.351	2.568	3.750	1.00 ± 0.17
SQLite+WAL	3.006 ± 0.352	2.604	3.741	🥇 1.00

servo/servo@`d167344`

Command	Mean [s]	Min [s]	Max [s]	Relative
SQLite	4.683 ± 0.186	4.491	4.995	🥇 1.00
SQLite+WAL	4.811 ± 0.186	4.579	5.151	1.03 ± 0.06

paritytech/substrate@`be1b8ef`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
SQLite	497.6 ± 19.1	480.4	544.4	🥇 1.00
SQLite+WAL	519.7 ± 4.7	513.7	528.7	1.04 ± 0.04

tikv/tikv@`06c3e76`

Command	Mean [s]	Min [s]	Max [s]	Relative
SQLite	3.621 ± 0.766	3.238	5.777	1.01 ± 0.23
SQLite+WAL	3.596 ± 0.282	3.309	4.264	🥇 1.00

weihanglo · 2021-06-02T13:36:48Z

Benchmark on Windows

Done some benchmark on Windows but the machine lacks lots of dependencies, so only a portion of test cases are included. We can see that SQLite outperforms original per-file version by 1.27-1.40x, and faster than seld from 1.04-1.60x.

Great result so far!

expand the benchmark details

Environment information:

Windows 10 Home
Intel Core i7 6500U 2.50GHz
RAM 8GB
512GB SATA 3.0 M.2 SSD

`b1684e2` (cargo)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`cargo`	466.3 ± 32.7	438.5	548.8	1.40 ± 0.12
`sled`	534.6 ± 33.7	494.5	598.7	1.60 ± 0.12
`sqlite`	334.1 ± 15.3	321.1	367.7	🥇 1.00

paritytech/substrate@`4652f9e`

Command	Mean [s]	Min [s]	Max [s]	Relative
`cargo`	2.513 ± 0.047	2.406	2.566	1.34 ± 0.04
`sled`	1.942 ± 0.089	1.867	2.176	1.04 ± 0.05
`sqlite`	1.870 ± 0.047	1.809	1.976	🥇 1.00

tikv/tikv@`eee35b7`

Command	Mean [s]	Min [s]	Max [s]	Relative
`cargo`	2.418 ± 0.094	2.352	2.674	1.27 ± 0.05
`sled`	2.072 ± 0.051	2.020	2.177	1.09 ± 0.03
`sqlite`	1.906 ± 0.034	1.846	1.948	🥇 1.00

epage · 2023-11-01T13:51:35Z

#12634 is adding sqlite to cargo. With the improvements this made to Windows, @weihanglo should we move this forward? Would a needs-mentor tag make sense for this?

weihanglo · 2023-11-01T15:32:38Z

This is include in the plan: https://hackmd.io/U_k79wk7SkCQ8_dJgIXwJg. And yes this is something likely can be done after #12634. I am assuming @ehuss already got something.

weihanglo · 2024-03-20T01:54:57Z

#13584 is another attempt to integrate SQLite for index cache. It basically replicates the behavior of the current filesystem cache, which stores entire JSON blobs from the index. Not ideal and there are performance losses in cache writes. On the official SQLite website they have benchmarked external blobs (Cargo's current behvior) and blobs-all-in-SQLite. Unsurprisingly on Linux it is performant to manipulate file IO.

I haven't done any benchmark on Windows, as I have no machine at this moment, though I know Windows is the area that will potentially gain a lot benefits from this. However, given cache writes has a poor performance with SQLite. I wonder if the experiment worth putting more efforts.

Something we could do

Design a better, reasonable SQLite schema to speedup IO on Linux. It might end up becoming a serialize/deserialize battle bewteen serde_json (IndexSummary::parse) versus returning whatever stored in SQLite.
If it is provied that on Windows everything is good but on Unix it's not. Is it possible that only on Windows we ship SQLite-cache index cache?

alexcrichton mentioned this issue May 6, 2019

Parse less JSON on null builds #6880

Merged

ehuss added the Performance Gotta go fast! label May 8, 2019

ehuss added the A-caching Area: caching of dependencies, repositories, and build artifacts label May 20, 2019

Eh2406 mentioned this issue Aug 1, 2019

Fast clone graph #7168

Closed

Eh2406 mentioned this issue Aug 29, 2019

minimal-copy deserialize for InternedString #7310

Merged

weihanglo added the S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing. label Nov 1, 2023

weihanglo mentioned this issue Mar 1, 2024

refactor: abstract std::fs away from on-disk index cache #13515

Merged

weihanglo mentioned this issue Mar 14, 2024

feat: index cache in SQLite3 #13584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizes Cargo's registry cache format for fewer files #6908

Optimizes Cargo's registry cache format for fewer files #6908

alexcrichton commented May 6, 2019 •

edited by weihanglo

Loading

Eh2406 commented Aug 29, 2019

weihanglo commented May 30, 2021 •

edited

Loading

Eh2406 commented May 30, 2021

weihanglo commented May 30, 2021

weihanglo commented May 31, 2021 •

edited

Loading

diem/diem@`05bdd16`

mozilla/gecko-dev@`5977b6f`

rust-lang/rust@`59579907`

servo/servo@`d1673446`

paritytech/substrate@`be1b8ef0`

tikv/tikv@`06c3e76e`

weihanglo commented May 31, 2021

diem/diem@`05bdd16`

mozilla/gecko-dev@`5977b6f`

rust-lang/rust@`5957990`

servo/servo@`d167344`

paritytech/substrate@`be1b8ef`

tikv/tikv@`06c3e76`

weihanglo commented Jun 2, 2021 •

edited

Loading

`b1684e2` (cargo)

paritytech/substrate@`4652f9e`

tikv/tikv@`eee35b7`

epage commented Nov 1, 2023

weihanglo commented Nov 1, 2023

weihanglo commented Mar 20, 2024

Optimizes Cargo's registry cache format for fewer files #6908

Optimizes Cargo's registry cache format for fewer files #6908

Comments

alexcrichton commented May 6, 2019 • edited by weihanglo Loading

Eh2406 commented Aug 29, 2019

weihanglo commented May 30, 2021 • edited Loading

Eh2406 commented May 30, 2021

weihanglo commented May 30, 2021

weihanglo commented May 31, 2021 • edited Loading

Benchmark results for large projects

weihanglo commented May 31, 2021

weihanglo commented Jun 2, 2021 • edited Loading

Benchmark on Windows

b1684e2 (cargo)

epage commented Nov 1, 2023

weihanglo commented Nov 1, 2023

weihanglo commented Mar 20, 2024

alexcrichton commented May 6, 2019 •

edited by weihanglo

Loading

weihanglo commented May 30, 2021 •

edited

Loading

weihanglo commented May 31, 2021 •

edited

Loading

weihanglo commented Jun 2, 2021 •

edited

Loading

`b1684e2` (cargo)