Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Cache is valid if checkout or contents hasn't changed #10507

Merged
merged 3 commits into from
Apr 5, 2022

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Mar 25, 2022

#10482 allow the registry to have cashe keys at a per file level.
On master if the currently checked out commit changes all cached files are regenerated.
After this commit we also check if GITs hash of the particular file has changed.

To avoid thrashing cached files this PR only checks for the presence of both hashes, but does not write them both. This means that nightly after this branch will still be able to share file caches with older versions, specifically current stable.

When sufficient versions of stable know how to read the new format, a one line change can be made to start writing the new format.

@rust-highfive
Copy link

r? @ehuss

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 25, 2022
@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 25, 2022

cc @arlosi. Thanks for the set up work.

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I've read through #10064 and #10482 but cannot find the discussion about the new index format <file>:<remote>. I might miss something 😞 Can I have a link to where it was discussed in the PR description?

src/cargo/sources/registry/remote.rs Outdated Show resolved Hide resolved
@Eh2406
Copy link
Contributor Author

Eh2406 commented Apr 4, 2022

Hi. I've read through #10064 and #10482 but cannot find the discussion about the new index format <file>:<remote>. I might miss something 😞 Can I have a link to where it was discussed in the PR description?

Did not miss any discussion. This was an independent thought I had when I looked at this code for the first time after reviewing #10482. I'm not sure how much context to explain. If I end up with two little, I'm sorry, I can add more.

Before #10482 the entire git checkout had one version number. We used for that version number the hash of the commit at HEAD. Once we read the data out of git we saved it too disc in a slightly optimized format. To make sure we didn't read an optimized file that was now out of date, the file started with the version number. If we fetched a new HEAD commit then we would overwrite all cache files.

The big change from #10482 is to think of each file as having its own version number. This will be critical for HTTP based indexes, but was not intended to make a change for git based indexes. To maintain the same behavior, no matter what file is being looked at it's version number is "hash of the commit at HEAD". Thus it has basically the same performance, and maintains the same state on disk.

The idea of this PR is that the file will be considered up to date if either it ends with "hash of the commit at HEAD" or starts with a "hash of the files contents". When enough time has passed we can start writing files with both hashes, and get cash hits more often. But we will continue to use files on disk that were created by older Cargos (like stable) that only have the commit hash.

@weihanglo
Copy link
Member

Wow, thanks! What I guessed is very close to your detailed explanation 😆

There are some places I believe we can improve:

  • We should pass index_version, which is per-file level cache key, into load_helper.
    match load_helper(&self, path, current_version.as_deref()) {
  • Since the RemoteRegistry::load is an ad-hoc implementation, could we have a doc comment to explain the new format of index_version? That would help other contributors a lot. (I know we are not writing any at this time, so feel free to skip this suggestion)

Eh2406 and others added 2 commits April 4, 2022 15:10
Co-authored-by: Weihang Lo <weihanglo@users.noreply.github.com>
@Eh2406
Copy link
Contributor Author

Eh2406 commented Apr 4, 2022

Both excellent points! The code has been corrected, and a comment has been added.

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks really nice! Thank you!

@weihanglo
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Apr 5, 2022

📌 Commit 8058c3d has been approved by weihanglo

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 5, 2022
@bors
Copy link
Contributor

bors commented Apr 5, 2022

⌛ Testing commit 8058c3d with merge 1a052ae...

@bors
Copy link
Contributor

bors commented Apr 5, 2022

☀️ Test successful - checks-actions
Approved by: weihanglo
Pushing 1a052ae to master...

@bors bors merged commit 1a052ae into rust-lang:master Apr 5, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 6, 2022
Update cargo

5 commits in 1ef1e0a12723ce9548d7da2b63119de9002bead8..e2e2dddebe66dfc1403a312653557e332445308b
2022-03-31 00:17:18 +0000 to 2022-04-05 17:04:53 +0000
- Part 2 of RFC2906 -- allow inheriting from a different `Cargo.toml` (rust-lang/cargo#10517)
- File Cache is valid if checkout or contents hasn't changed (rust-lang/cargo#10507)
- Fix how scrape-examples handles proc macros (rust-lang/cargo#10533)
- tools: update checkout action on CI (rust-lang/cargo#10521)
- Don't error if no binaries were installed (rust-lang/cargo#10508)
@ehuss ehuss added this to the 1.62.0 milestone Apr 7, 2022
@Eh2406 Eh2406 deleted the finer_grained_cashe branch April 13, 2022 16:25
bors added a commit that referenced this pull request Sep 8, 2022
Cache index files based on contents hash

Since #10507 Cargo has known how to read registry cached files whose index version starts with the hash of the file contents. Git makes it very cheap to determine the hash of a file. This PR switches cargo to start writing the new format.

Cargoes from before #10507 will not know how to read, and therefore overwrite, cached files written by Cargos after this PR.

Cargos after this PR can still read, and will consider up-to-date cached files written by all older Cargos.

As I'm writing this out I'm thinking that there may not be any point in writing a file that has both. An alternative implementation just writes the file contents hash. 🤔
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants