-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: use stable hash from rustc-stable-hash #14116
Conversation
This is blocked on releasing rustc-stable-hash to crates.io |
From a <1 minute reading of "-Ztrim-paths build a stable cross-platform path for the registry and git sources." My understanding is that the intent here is to use a hash function to create a stable path to a particular set of source files or artifacts. If there is a hash collision then potentially hash(malicous-sources) == hash(trusted-sources) and so malicous-sources could be used instead of trusted-sources, silently. |
src/cargo/util/hasher.rs
Outdated
fn write(&mut self, bytes: &[u8]) { | ||
self.0.write(bytes) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure only forwarding Hasher::write
is enough, since the endian-ness handling is done on the individual write_{u,i}{8,16,32,64,128}
methods and not forwarding those will bypass that endian-ness handling 1.
I think it's also going to bypass the {u,i}size
handling.
Footnotes
-
the default implementation use native endian-ness, instead of a fixed one ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rust-lang/rustc-stable-hash#6 and rust-lang/rustc-stable-hash#8, I think StableSipHasher128
should be a drop-in replacement that we can use directly.
I am also thinking of blake3 as an alternative, but haven't figured out how to make it play nice with ExtendedHasher
.
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequence of changing how `SourceId` is hashed, the global cache tracker is also affected because Cargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
// The hash value depends on endianness and bit-width, so we only run this test on | ||
// little-endian 64-bit CPUs (such as x86-64 and ARM64) where it matches the | ||
// well-known value. | ||
// The hash value should be stable across platforms, and doesn't depend on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we need to fix, if the goal is a fully cross-platform.
For information, |
Status update: Had some chats with Urgau, here is data from benchmark blake3/blake2s on rustc
While in Cargo hash is not a dominant factor of performance it still plays a role. A Will need more real world benchmarks for like |
☔ The latest upstream changes (presumably #14334) made this pull request unmergeable. Please resolve the merge conflicts. |
What does this PR try to resolve?
This helps
-Ztrim-paths
build a stable cross-platform path for theregistry and git sources. Sources files then can be found from the same
path when debugging.
See #13171 (comment)
How should we test and review this PR?
There are a few caveats, and we should do an FCP before merge:
Need to put this in the Cargo CHANGELOG.
SourceId
is hashed, the global cachetracker is also affected because Cargo writes source identifiers (e.g.
index.crates.io-6f17d22bba15001f
) to SQLite.SipHasher in std on short things like
SourceId
, but for long stufflike fingerprint. See Additional information.
StableHasher is used in several places. We should consider if there is a need for cryptographyic hash (see #13171 (comment)).
rustc -vV
.-C metadata
UnitInner::dep_hash
cargo package
to verify if files were modified before and after the build.SourceId
identifier likeregistry/src
path,and
-Zscript
target directories.Additional information
Benchmark on x86_64-unknown-linux-gnu
Benchmark on aarch64-apple-darwin
Criterion benchmark script