Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache downloaded wheel when range requests aren't supported #5089

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Jul 16, 2024

Summary

When range requests aren't supported, we fall back to streaming the wheel, stopping as soon as we hit a METADATA file. This is a small optimization, but the downside is that we don't get to cache the resulting wheel...

We don't know whether METADATA will be at the beginning or end of the wheel, but it seems like a better tradeoff to download and cache the entire wheel?

Closes: #5088.

Sort of a revert of: #1792.

@charliermarsh charliermarsh force-pushed the charlie/stream branch 2 times, most recently from de63976 to a5855d1 Compare July 16, 2024 02:19
@charliermarsh charliermarsh added the performance Potential performance improvement label Jul 16, 2024
@charliermarsh charliermarsh marked this pull request as ready for review July 16, 2024 02:21
@charliermarsh
Copy link
Member Author

I'm sort of torn on this.

@zanieb
Copy link
Member

zanieb commented Jul 16, 2024

Would it be a pain to use a heuristic like... only finish downloading if it's greater than some percentage into the file? If we find a metadata entry 10% into the file it seems excessive to unconditionally download the whole wheel when we're trying many versions.

@morotti
Copy link

morotti commented Jul 16, 2024

We don't know whether METADATA will be at the beginning or end of the wheel, but it seems like a better tradeoff to download and cache the entire wheel?

METADATA are at end of the wheel, as per PEP 407. https://peps.python.org/pep-0427/#recommended-archiver-features

I vaguely recall there was a similar discussion in pip to try to only download metadata (I can't find the thread anymore). They checked many packages and only found one exception to confirm the rule, some google packages had the metadata at the start (around tensorflow if I recall well), because google used their own build system that did its own thing. I think they fixed it a while back.

@charliermarsh
Copy link
Member Author

charliermarsh commented Jul 16, 2024

@zanieb - Perhaps... It would be harder to implement for sure. Given that this is not the happy path and PEP 407 recommends putting the archive at the end anyway, I'm inclined to just move forward with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming metadata fallback should cache wheel
4 participants