Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hashes and versions to all distributions #3589

Merged
merged 1 commit into from
May 14, 2024

Conversation

charliermarsh
Copy link
Member

Summary

In ResolutionGraph::from_state, we have mechanisms to grab the hashes and metadata for all distributions -- but we then throw that information away. This PR preserves it on a new AnnotatedDist (yikes, open to suggestions) that wraps ResolvedDist and includes (1) the hashes (computed or from the registry) and (2) the Metadata23, which lets us extract the version.

Closes #3356.

Closes #3357.

@charliermarsh charliermarsh marked this pull request as ready for review May 14, 2024 19:26
@charliermarsh charliermarsh added the preview Experimental behavior label May 14, 2024
}
}

fn from_source_dist(
source_dist: &distribution_types::SourceDist,
hashes: &[HashDigest],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't love making hashes an argument everywhere but seemed like the most straightforward option (since it only exists on AnnotatedDist).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah hmmm, I don't like it either.

I think when I was writing this, I was assuming the hashes would be added to types like PathSourceDist directly. Out of curiosity, why didn't you take that approach? AIUI, hashes are on some of the distribution types but not all of them. I was thinking we would make that a bit more consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not quite the same. The hash on the registry distribution is just the hash reported by the registry, not the hash we compute locally. And the distribution exists well before we compute its hashes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't compute hashes for registry distributions during resolution. We only validate them during installation.

@charliermarsh charliermarsh force-pushed the charlie/package-resolve branch 3 times, most recently from 0458e62 to 780dbc3 Compare May 14, 2024 20:01
Copy link
Member

@ibraheemdev ibraheemdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, don't have strong opinions about the naming. Does LockedDist make sense because it's locked to a specific version..?

@charliermarsh
Copy link
Member Author

👍 I'm gonna defer to @BurntSushi on the names since he has the bigger picture in his head w/r/t the other structs in lock.rs etc.

Base automatically changed from charlie/ex to main May 14, 2024 20:41
@charliermarsh charliermarsh force-pushed the charlie/package-resolve branch from 780dbc3 to 52fcf94 Compare May 14, 2024 20:41
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this coming in as-is. I don't think I see anything wrong, although I don't love the code structure/organization. I elaborated a bit in the comments. It is possible that I might have some gaps in my understanding though, so I could have misguided opinions here.

}
}

fn from_source_dist(
source_dist: &distribution_types::SourceDist,
hashes: &[HashDigest],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah hmmm, I don't like it either.

I think when I was writing this, I was assuming the hashes would be added to types like PathSourceDist directly. Out of curiosity, why didn't you take that approach? AIUI, hashes are on some of the distribution types but not all of them. I was thinking we would make that a bit more consistent.

.or(hashes.first())
.cloned()
.map(Hash::from);
Ok(SourceDist { url, hash })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between the hash on the annotated dist versus the hash that's already on the RegistrySourceDist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think we should just ignore the hashes from the annotated dist here actually.

The hash on the RegistrySourceDist is the hash reported by the registry for that specific distribution (file). The hashes on the annotated dist will be the collection of all hashes for all distributions in the registry.

"Every package should have metadata: {:?}",
dist.version_id()
)
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love these panics. (I think some like this were already in this code though.) I think this circles back to my idea of putting this data on the distribution types. Maybe that would lead to increased memory usage though? Not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe put it on ResolvedDist? I'd have to look, but it'd be responsible to do so separately IMO since this pattern already exists here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, because we need to be able to go from compatible dist to resolved dist... At some point we need to merge the metadata with the distribution, and I guess that's here (for now, at least).

"Every package should have metadata: {:?}",
dist.version_id()
)
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily asking for any changes here, but just a "shout out loud": I feel like I've seen this pattern a lot where the code for processing distributions repeats itself. This looks almost identical to the metadata extraction above for example. (I don't yet have opinions on how to resolve this or even if it's worth resolving.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's nearly identical. I'm gonna look at DRYing it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, to be clear: it's exactly the same. They're different arms in a match.

.first()
.or(hashes.first())
.cloned()
.map(Hash::from);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... IIUC, hashes can be two different things. If the distribution comes from a registry, it's the list of hashes served by the registry. If the distribution comes from elsewhere, it's the hashes we computed.

@charliermarsh charliermarsh force-pushed the charlie/package-resolve branch from 52fcf94 to 4ff08ea Compare May 14, 2024 23:01
@charliermarsh charliermarsh force-pushed the charlie/package-resolve branch from 4ff08ea to 0f72845 Compare May 14, 2024 23:02
@charliermarsh charliermarsh enabled auto-merge (squash) May 14, 2024 23:03
@charliermarsh charliermarsh merged commit 4a42730 into main May 14, 2024
44 checks passed
@charliermarsh charliermarsh deleted the charlie/package-resolve branch May 14, 2024 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Experimental behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants