Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit handling of hashes in the lockfile #4924

Closed
Tracked by #3611 ...
konstin opened this issue Jul 9, 2024 · 6 comments
Closed
Tracked by #3611 ...

Audit handling of hashes in the lockfile #4924

konstin opened this issue Jul 9, 2024 · 6 comments
Assignees
Labels
preview Experimental behavior

Comments

@konstin
Copy link
Member

konstin commented Jul 9, 2024

Audit our handling of hashes. Some hashes are generated. Some are retrieved from registries. Sometimes hashes should not be used (path dependencies) where as sometimes they should (registry dependencies).

I think there are three main issues here:

First is that hashes come from two places: one place is the registry itself. Another place is by hashes we compute ourselves. Which hashes do we use in the lock file? Does it matter? Which should we use?

Second is that we often have multiple hashes available for any given artifact. Do we need to store all of them in the lock file? Or can we just pick the "best" one?

Third is whether we are doing any hash checking. I don't think we are today. But we probably should be.

@konstin konstin changed the title Audit our handling of hashes. Some hashes are generated. Some are retrieved from registries. Sometimes hashes should not be used (path dependencies) where as sometimes they should (registry dependencies). Audit handling of hashes in the lockfile Jul 9, 2024
@konstin konstin added the preview Experimental behavior label Jul 9, 2024
@charliermarsh
Copy link
Member

I can answer all of these questions (regarding what we do today) if helpful.

@konstin
Copy link
Member Author

konstin commented Jul 16, 2024

Do you expect that we have to change the lockfile (schema) for this?

@charliermarsh
Copy link
Member

I think we need to make hashes optional for registry-based distributions. I can't remember if it's optional today.

@BurntSushi
Copy link
Member

If we make them optional in the lock file, then we won't be able to do hash checking. Are we okay with that?

@charliermarsh
Copy link
Member

We can still hash-check the distributions for which we do have hashes.

@charliermarsh charliermarsh self-assigned this Jul 17, 2024
@charliermarsh
Copy link
Member

First is that hashes come from two places: one place is the registry itself. Another place is by hashes we compute ourselves. Which hashes do we use in the lock file? Does it matter? Which should we use?

I think we should follow the strategy we already use today: if from the registry, we use the registry-provided hashes; if from a direct URL, we compute the hash ourselves. If the registry lacks hashes, we don't include any hashes (and warn the user).

Second is that we often have multiple hashes available for any given artifact. Do we need to store all of them in the lock file? Or can we just pick the "best" one?

I think we should just store the "best" one to save space. I'll verify that we're doing this today.

Third is whether we are doing any hash checking. I don't think we are today. But we probably should be.

We aren't. I think we should add "verify any hashes that exist, but don't require them" (so that we can still interoperate with registries that lack hashes).

charliermarsh added a commit that referenced this issue Jul 17, 2024
## Summary

If a registry doesn't include hashes, then we won't include them in the
lockfile either.

Part of: #4924.

Closes: #5120.
charliermarsh added a commit that referenced this issue Jul 17, 2024
## Summary

We only need to store one hash -- it should be the "strongest" hash. In
practice, most registries (like PyPI) only serve one, and we only
compute a SHA256 hash for direct URLs.

Part of: #4924

## Test Plan

I verified that changing:

```diff
diff --git a/crates/distribution-types/src/hash.rs b/crates/distribution-types/src/hash.rs
index 553a74f55..d36c62286 100644
--- a/crates/distribution-types/src/hash.rs
+++ b/crates/distribution-types/src/hash.rs
@@ -31,7 +31,7 @@ impl<'a> HashPolicy<'a> {
     pub fn algorithms(&self) -> Vec<HashAlgorithm> {
         match self {
             Self::None => vec![],
-            Self::Generate => vec![HashAlgorithm::Sha256],
+            Self::Generate => vec![HashAlgorithm::Sha256, HashAlgorithm::Sha512],
             Self::Validate(hashes) => {
                 let mut algorithms = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>();
                 algorithms.sort();
```

Then running `uv lock` with a URL gave me:

```toml
[[distribution]]
name = "iniconfig"
version = "2.0.0"
source = { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl" }
wheels = [
    { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha512:44cc53a6c8dd7cf4d6d52bded308bcc4b4f85fff2ed081f60f7d4beaa86a7cde6d099e3976331232d4cbd472ad5d1781064725b0999c7cd3a2a4d42df687ee81" },
]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Experimental behavior
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants