Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate failure to load caches above 2^32 - 1 bytes #336

Open
woodruffw opened this issue Jun 12, 2024 · 4 comments
Open

Investigate failure to load caches above 2^32 - 1 bytes #336

woodruffw opened this issue Jun 12, 2024 · 4 comments
Assignees

Comments

@woodruffw
Copy link
Member

Opening this as a reminder to myself.

This is likely related to #238 and #200: some recent torch wheels are >= 2.5GB, and pip appears to download them repeatedly without hitting the cache. My only SWAG so far is that this is because the body itself overflows msgpack's signed 32 bit limit on binary objects, per the spec.

Haven't fully diagnosed yet.

See: https://news.ycombinator.com/item?id=40659973

@woodruffw woodruffw self-assigned this Jun 12, 2024
@woodruffw
Copy link
Member Author

Looked some more into this: the person who reported this said that torch was serving 2 GB+ wheels, but I can't see any: https://pypi.org/project/torch/#files

That being said, I suspect this is still causing unnecessary cache misses due to #200: we end up storing large downloads (such as 700 MB torche wheels) that never get "hit", since the default msgpack load behavior is to limit binary bodies to ~100MB: https://msgpack-python.readthedocs.io/en/latest/api.html

@woodruffw
Copy link
Member Author

Hmm, I've still been unable to trigger this: it looks like msgpack.loads(payload) sets its maximum limits based on len(payload), so we should never really hit a binary object limit in practice.

@pradyunsg
Copy link
Contributor

2^32 - 1 GB

You spooked me -- you probably want to say bytes here.

@woodruffw
Copy link
Member Author

Indeed, sorry!

@woodruffw woodruffw changed the title Investigate failure to load caches above 2^32 - 1 GB Investigate failure to load caches above 2^32 - 1 bytes Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants