-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit handling of hashes in the lockfile #4924
Comments
I can answer all of these questions (regarding what we do today) if helpful. |
Do you expect that we have to change the lockfile (schema) for this? |
I think we need to make hashes optional for registry-based distributions. I can't remember if it's optional today. |
If we make them optional in the lock file, then we won't be able to do hash checking. Are we okay with that? |
We can still hash-check the distributions for which we do have hashes. |
I think we should follow the strategy we already use today: if from the registry, we use the registry-provided hashes; if from a direct URL, we compute the hash ourselves. If the registry lacks hashes, we don't include any hashes (and warn the user).
I think we should just store the "best" one to save space. I'll verify that we're doing this today.
We aren't. I think we should add "verify any hashes that exist, but don't require them" (so that we can still interoperate with registries that lack hashes). |
## Summary We only need to store one hash -- it should be the "strongest" hash. In practice, most registries (like PyPI) only serve one, and we only compute a SHA256 hash for direct URLs. Part of: #4924 ## Test Plan I verified that changing: ```diff diff --git a/crates/distribution-types/src/hash.rs b/crates/distribution-types/src/hash.rs index 553a74f55..d36c62286 100644 --- a/crates/distribution-types/src/hash.rs +++ b/crates/distribution-types/src/hash.rs @@ -31,7 +31,7 @@ impl<'a> HashPolicy<'a> { pub fn algorithms(&self) -> Vec<HashAlgorithm> { match self { Self::None => vec![], - Self::Generate => vec![HashAlgorithm::Sha256], + Self::Generate => vec![HashAlgorithm::Sha256, HashAlgorithm::Sha512], Self::Validate(hashes) => { let mut algorithms = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>(); algorithms.sort(); ``` Then running `uv lock` with a URL gave me: ```toml [[distribution]] name = "iniconfig" version = "2.0.0" source = { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl" } wheels = [ { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha512:44cc53a6c8dd7cf4d6d52bded308bcc4b4f85fff2ed081f60f7d4beaa86a7cde6d099e3976331232d4cbd472ad5d1781064725b0999c7cd3a2a4d42df687ee81" }, ] ```
Audit our handling of hashes. Some hashes are generated. Some are retrieved from registries. Sometimes hashes should not be used (path dependencies) where as sometimes they should (registry dependencies).
I think there are three main issues here:
First is that hashes come from two places: one place is the registry itself. Another place is by hashes we compute ourselves. Which hashes do we use in the lock file? Does it matter? Which should we use?
Second is that we often have multiple hashes available for any given artifact. Do we need to store all of them in the lock file? Or can we just pick the "best" one?
Third is whether we are doing any hash checking. I don't think we are today. But we probably should be.
The text was updated successfully, but these errors were encountered: