Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache misses when different projects shares same dependencies #196

Closed
CrendKing opened this issue Oct 26, 2017 · 10 comments
Closed

Cache misses when different projects shares same dependencies #196

CrendKing opened this issue Oct 26, 2017 · 10 comments

Comments

@CrendKing
Copy link

CrendKing commented Oct 26, 2017

I have Rust 1.22 nightly installed. I installed sccache with cargo install sccache (sccache 0.2.2). I followed instruction to set up RUSTC_WRAPPER=sccache. I noticed that even multiple projects shares same dependencies, sccache will not have cache hits as expected. My OS is Windows 10.

Steps to reproduce:

  1. Create a snippet project snippet1 with one dependency and no transitive dependency:
[dependencies]
itoa = "0.3.4"

Cargo.lock may look like

[root]
name = "snippet"
version = "0.1.0"
dependencies = [
 "itoa 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "itoa"
version = "0.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"

[metadata]
"checksum itoa 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)" = "8324a32baf01e2ae060e9de58ed0bc2320c9a2833491ee36cd3b4c414de4db8c"
  1. Copy snippet1 to snippet2 with no change.
  2. cargo build the snippet1 project. Then cargo clean and cargo build again. Observe 1 cache hit in sccache -s.
  3. cargo build the snippet2 project and sccache -s.

Expected result:
sccache should now report 2 cache hits.
Actual result:
sccache still reports 1 cache hit, indicate itoa 0.3.4 from snippet2 was not a cache hit.

Now, create .cargo/config file in the parent directory and point a common target-dir for the two projects. Repeat steps 1-4. sccache will report 2 cache hits.

This shows that sccache probably uses the cargo parameters --out-dir or even -L dependency as part of the hash key. I'm not 100% sure if there is any corner case, but this behavior defeats the purpose of sharing common library across projects with sccache.

Related cargo output for snippet1:

   Compiling itoa v0.3.4
     Running `sccache rustc --crate-name itoa D:\Application\Rust\Cargo\registry\src\git.luolix.top-1ecc6299db9ec823\itoa-0.3.4\src\lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=fd1fa8f333ce5a78 -C extra-filename=-fd1fa8f333ce5a78 --out-dir C:\workspace\snippet1\target\debug\deps -L dependency=C:\workspace\snippet1\target\debug\deps --cap-lints allow`

snippet2:

   Compiling itoa v0.3.4
     Running `sccache rustc --crate-name itoa D:\Application\Rust\Cargo\registry\src\git.luolix.top-1ecc6299db9ec823\itoa-0.3.4\src\lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=fd1fa8f333ce5a78 -C extra-filename=-fd1fa8f333ce5a78 --out-dir C:\workspace\snippet2\target\debug\deps -L dependency=C:\workspace\snippet2\target\debug\deps --cap-lints allow`
@CrendKing
Copy link
Author

CrendKing commented Oct 26, 2017

I have some experience with Gradle, a Java build tool. I think Cargo is pretty much equivalent at the same role in Rust as Gradle in Java world. Ideally, I wish Cargo will eventually integrate the same caching and incremental building functionality as Gradle does. So learning how Gradle does these would help Cargo and Rust community.

In Gradle, a build consists of tasks, and the task dependency graph is created at the beginning of the build. However, whether a task will actually execute depends on the state of the task inputs and outputs. As the doc says, if the Gradle found a task's inputs and outputs are unchanged, such task is considered UP-TO-DATE and will not run.

I'm not familiar how Cargo handles task dependency internally, but in long term, instead of relying on command arguments and environment variables, maybe Cargo/sccache could define "inputs" and "outputs" of its "tasks". And the caching will then rely on these more reliable objects. Of course, the "inputs" should not only include the content of source file but also the set of recognized parameters to rustc that may produce different output. And "output" can be considered just the content of the file, instead of its full path. Or can be a hash or last modified time.

Basically, I think the current Rust support is like write code in a language without types. Everything is string. Developing a strong-typed system for it would make it more reliable and maintainable.

@luser
Copy link
Contributor

luser commented Oct 26, 2017

We have definitely discussed making cargo able to do this caching internally. It ought to be able to do even better than sccache by checking the entire dependency graph and caching the output of the entire build, so that if you have previously built the exact same source it should be a quick cache hit. That bug belongs on cargo, though. :)

To your original bug: I have some extra Rust cache debug code on a branch that I've never gotten around to landing: https://github.com/luser/sccache/tree/rust-hash-info

I just rebased it to master and tried your experiment locally. You can try it out yourself by building sccache from that branch and then running your cargo compiles with SCCACHE_RUST_CACHE_DEBUG=/path/to/a/dir. sccache will write a timestamped file for each crate it computes a hash key for, with one line for each bit of data that gets fed into the hash. Running this locally on the snippet sample and then diffing the two files for the itoa crate shows:

--- /tmp/sccache-rust-debug/itoa_2017-10-26_12:13:46	2017-10-26 12:13:46.785861224 -0400
+++ /tmp/sccache-rust-debug/itoa_2017-10-26_12:13:56	2017-10-26 12:13:56.369916844 -0400
@@ -13,8 +13,8 @@
 -C
 extra-filename=-bb9fe302248b7433
 --out-dir
-/build/snippet/target/debug/deps
+/build/snippet2/target/debug/deps
 --cap-lints
 allow
 -L
-dependency=/build/snippet/target/debug/deps
+dependency=/build/snippet2/target/debug/deps

So the issue is that the path to --out-dir is an absolute path, and that same path gets included as a -L dependency= argument to rustc as well. I'm pretty confident that we could simply remove --out-dir from the hash computation, since it shouldn't affect the actual compiler output. I'm less sure about removing the -L dependency= path. rustc is going to look for files there, but I confess I don't quite understand its lookup strategy so I don't know if that would make it possible to wind up returning incorrect cached results! @alexcrichton : what do you think?

@alexcrichton
Copy link
Contributor

@luser yeah removing --out-dir I believe should be safe to do. I think that actually removing -L should be safe to do as well actually. The -L flags themselves don't change the output but rather the output found in those locations. In that sense if sccache is correctly tracking dependencies it'd already be hashing the files at those locations, so hashing -L should be redundant.

I think that sccache is already hashing the relevant files (native libs, rust libs, etc), so should be safe to remove both!

@luser
Copy link
Contributor

luser commented Feb 2, 2018

I looked into this a bit more to see if I could merge #206, but it looks like currently rlibs wind up with the build directory encoded in them in several places.

I have a simple test crate that has only itoa as a dependency. I've cloned it to two different directories, and building it with cargo build -v in both directories shows it running the same rustc command (modulo the options discussed here) to compile itoa in both directories:

 Running `rustc --crate-name itoa /home/luser/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/itoa-0.3.4/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=69dd40cb32052c2f -C extra-filename=-69dd40cb32052c2f --out-dir /build/snippet/target/debug/deps -L dependency=/build/snippet/target/debug/deps --cap-lints allow`

 Running `rustc --crate-name itoa /home/luser/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/itoa-0.3.4/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=69dd40cb32052c2f -C extra-filename=-69dd40cb32052c2f --out-dir /build/snippet2/target/debug/deps -L dependency=/build/snippet2/target/debug/deps --cap-lints allow`

The generated rlib files aren't the same, though:

$ sha1sum /build/snippet/target/debug/deps/libitoa-69dd40cb32052c2f.rlib /build/snippet2/target/debug/deps/libitoa-69dd40cb32052c2f.rlib
b0fc792541fb3d97708ade58c70525cd4a1ffd9d  /build/snippet/target/debug/deps/libitoa-69dd40cb32052c2f.rlib
5e011e37eed6de6ec8bef4d8f88b5d73cf08ed1f  /build/snippet2/target/debug/deps/libitoa-69dd40cb32052c2f.rlib

Running them through diffoscope shows that the build directory shows up in various places (.debug_info, naturally, but there are other differences).

I tried using the path prefix remapping available in nightly Rust to see if I could make them match with that, running the same builds like RUSTFLAGS='-Zremap-path-prefix-from=/build/snippet -Zremap-path-prefix-to=/src', and that seems to have made them closer, but they still don't quite match. diffoscope shows that the differences are entirely contained within the rust.metadata.bin:

├── rust.metadata.bin
│ @@ -1481,11 +1481,11 @@
│  00005c80: 5300 00d3 5300 007b 5400 0095 5400 003e  S...S..{T...T..>
│  00005c90: 5500 0058 5500 0001 5600 001b 5600 00c4  U..XU...V...V...
│  00005ca0: 5600 00de 5600 0087 5700 00a1 5700 004a  V...V...W...W..J
│  00005cb0: 5800 0064 5800 000d 5900 0027 5900 00d0  X..dX...Y..'Y...
│  00005cc0: 5900 00ea 5900 0093 5a00 00ad 5a00 0056  Y...Y...Z...Z..V
│  00005cd0: 5b00 0004 6974 6f61 1878 3836 5f36 342d  [...itoa.x86_64-
│  00005ce0: 756e 6b6e 6f77 6e2d 6c69 6e75 782d 676e  unknown-linux-gn
│ -00005cf0: 75b7 8386 a080 baef e6a0 019a 6bb2 55ef  u...........k.U.
│ -00005d00: 5774 aa6f fd4a d840 fa8b ba00 0000 0000  Wt.o.J.@........
│ -00005d10: 0a8d b901 0000 0000 02c0 0184 0302 b212  ................
│ -00005d20: 0a06 5a9a 9f01                           ..Z...
│ +00005cf0: 75d9 89b1 8d8b 92ac c73a 9a6b b255 ef57  u........:.k.U.W
│ +00005d00: 74aa 6ffd 4ad8 40fa 8bba 0000 0000 000a  t.o.J.@.........
│ +00005d10: 8db9 0100 0000 0002 c001 8403 02b2 120a  ................
│ +00005d20: 065a 9a9f 01                             .Z...

(nicer HTML view of the diff, only available for 30 days)

@luser
Copy link
Contributor

luser commented Feb 20, 2018

I filed rust-lang/rust#48019 and mw submitted a potential fix in rust-lang/rust#48162 .

@CrendKing
Copy link
Author

CrendKing commented Apr 10, 2018

I tested with the latest nightly (1.27.0-nightly). It seems the problem is solved. Can anyone verify?

BTW, thanks for the work!

@luser
Copy link
Contributor

luser commented May 10, 2018

OK, I got around to re-testing this with latest nightly Rust and it does seem workable! My simple test script shows that I can build the same crate from two different directories passing --remap-path-prefix=$PWD=/src and the resulting rlibs are identical.

I wrote a patch for sccache that makes this work, but I had to do some local fiddling. Notably, I had to invoke cargo with CARGO_INCREMENTAL=0 RUSTFLAGS="--remap-path-prefix=$PWD=/src". With that in place, I built my snippet crate, then cloned it to a different directory and built it there and the second build got cache hits from the first build. Progress!

I'm not sure that patch is 100% correct yet, because if you don't specify the --remap-path-prefix you won't actually get the same output (because the cwd winds up encoded in the debug info) but sccache doesn't validate that.

@mitchmindtree
Copy link

Ahh @luser I had not yet seen your last comment here before commenting on #206, it sounds like you've made some promising progress! Would you be opposed to us opening a PR to sccache with your patch? Is there anything that you would like to see resolved first?

@luser
Copy link
Contributor

luser commented Jul 23, 2018

@mitchmindtree I forget where I left off on that work exactly. I think we decided that we ought to include the current working directory in the hash for Rust compilation (at least when debug symbols are enabled, I think) and then we could handle the --remap-path-prefix argument to rewrite cwd paths to make them cacheable.

@luser
Copy link
Contributor

luser commented Dec 17, 2018

I rewrote my patch because rebasing it would have been a pain. It seems to work reasonably well in testing:
#345

That should fix this issue as filed: if you have two crates that depend on the same crates.io crate they should be able to get cache hits with that patch. (Obviously you need to be compiling the exact same version of the crate with the exact same versions of its dependencies and the compiler options need to match as well.)

@luser luser closed this as completed Dec 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants