Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCP: Alternate cargo freshness algorithm, unstable flag to annotate depinfo file with checksums and file sizes #765

Closed
2 of 3 tasks
Xaeroxe opened this issue Jul 1, 2024 · 4 comments
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team

Comments

@Xaeroxe
Copy link

Xaeroxe commented Jul 1, 2024

This is a major change proposal for something I've been working on for about a month. Here's some more context

Original cargo issue I'm trying to solve: rust-lang/cargo#6529

Cargo implementation PR: rust-lang/cargo#14137

Cargo tracking issue: rust-lang/cargo#14136

Rustc implementation PR (blocked on acceptance of this MCP): rust-lang/rust#126930

Proposal

The cargo project would like to experiment with the use of source file checksums to determine if a rust crate is fresh as an alternative to file mtimes. This is particularly valuable on systems with poor mtime accuracy, or otherwise poor mtime implementations. It's also very valuable in CI/CD, where mtimes frequently have no relation to the contents of the build cache.

It is generally not possible to lock files in a systematically reliable manner that can be trusted cross platform. This is especially true on unix systems. So in order to behave correctly cargo must expect that the input source files could be changed at any time, even in the middle of a build. This means that the moment at which you take a checksum is relevant. Otherwise the freshness algorithm experiences time-of-check time-of-use errors. Unfortunately it's not possible for cargo to create a list of which files need to be checked at the beginning of the build. That list is discovered over the duration of the build. Since rustc is the first to discover a file, it also ought to take the checksum and record it to the depinfo file so that cargo can use it later.

I'm not asking for a commitment to use checksums in every build, or even a commitment to make them available indefinitely. I just want to give the ecosystem a chance to experience this and experiment with it. So I'm proposing that rustc receive an unstable flag --checksum-hash-algorithm=<algorithm> that a nightly cargo can use on nightly rustc in order to have rustc write these checksums to comments in the dep-info makefile. Cargo would only use the rustc flag when an unstable cargo flag is present. Additionally, these lines would include the file length at the time which the checksum was recorded so that cargo can sometimes short circuit the freshness check. The format would go something like this

# checksum:{checksum_hash} file_len:{file_len} {path}
# 
# Example:
# checksum:sha256=0fb8db4878... file_len:30 src/main.rs

Checksum algorithm

I have no strong feelings on which checksum algorithm is used and have intentionally authored my code to allow for changes to the checksum algorithm later. Right now I've authored support for xxhash and sha256. The team has determined that a cryptographically secure algorithm is necessary, for correctness and security reasons. This means xxhash is not suitable. One alternative that hasn't yet been tried is the blake3 algorithm, which promises to be cryptographically secure and fast. This is mostly a performance question so I feel it should be informed by real world experience and profiling.

I also believe the algorithm used for these checksums should not be required to be the same as the checksums used for debug files. However, if the algorithm happens to match the one used for debug files, that checksum should not be computed twice.

Anticipated performance impact

Taking a checksum is not as fast as reading an mtime, especially for large files. So for totally fresh builds it is expected this may have a slight performance degradation. That's probably worth exchanging for the improved correctness of the freshness algorithm on systems with poor mtime implementations. It's worth noting this only represents a correctness improvement if the checksum algorithm effectively avoids checksum collisions. The time it takes to compute a checksum pales in comparison to the time it takes to perform a build. So this will probably still be a relatively short check.

Build scripts

This proposal has a known deficiency in relation to build scripts. Build scripts may ingest files from all over the system, and as of this writing won't provide a checksum prior to ingesting these files. Ideally, long term, I want build scripts to also compute and output these checksums. That would be made significantly easier if an official build script API were made available. This is being explored in rust-lang/cargo#12432. In the meantime, cargo will continue using mtimes for build script files.

Mentors or Reviewers

I don't have anyone in mind but I'm happy to accept volunteers.

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@Xaeroxe Xaeroxe added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Jul 1, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 1, 2024

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:

@rustbot concern reason-for-concern 
<description of the concern> 

Concerns can be lifted with:

@rustbot resolve reason-for-concern 

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler @rust-lang/compiler-contributors

@rustbot rustbot added the to-announce Announce this issue on triage meeting label Jul 1, 2024
@Xaeroxe Xaeroxe changed the title MCP: Alternate cargo freshness algorithm: Unstable flag to annotate depinfo file with checksums and file sizes MCP: Alternate cargo freshness algorithm, unstable flag to annotate depinfo file with checksums and file sizes Jul 1, 2024
@Noratrieb
Copy link
Member

Details like the hashing algorithm can be ironed out later, but this seems like a good thing to experiment with.
@rustbot second

@rustbot rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Jul 1, 2024
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jul 4, 2024
@Xaeroxe
Copy link
Author

Xaeroxe commented Jul 13, 2024

I don't think we ever formally entered FCP. What are our next steps to achieve that?

@Noratrieb
Copy link
Member

@apiraino goes over MCPs every once in a while and marks them as accepted when they've been in FCP for over 10 days, it's not automated. i can do it for you here.
@rustbot label -final-comment-period +major-change-accepted

@rustbot rustbot added major-change-accepted A major change proposal that was accepted to-announce Announce this issue on triage meeting and removed final-comment-period The FCP has started, most (if not all) team members are in agreement labels Jul 13, 2024
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team
Projects
None yet
Development

No branches or pull requests

4 participants