-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CacheControl instead of custom cache code #1748
Conversation
"1.8", | ||
"--download-cache has been deprecated and will be removed in " | ||
" the future. Pip now has a generic HTTP Cache which will be " | ||
"used automatically." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner if the second sentence was "Pip now automatically uses and configures its cache"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or something like that, I don't think the HTTP component is important,is my point)
(awesome stuff) |
Currently this branch is adding two new options, What do people think? |
Doesn't pip already have CLI flags/env vars for configuring the cache? Why does this require adding new flags. |
So two primary reasons:
And a secondary reason:
|
Oh, and also:
|
@pypa/pip-developers Thoughts on |
I don't dislike having Having said that, I don't object to |
Ok, I went with |
@pfmoore If you can try this out on Windows that'd be awesome. All you need to do is run this branch, and try to like, pip install Django or something big twice in a row. First time should be slow second time should be instant. Also you should see a |
Test failures fixed in psf/cachecontrol#30 |
@dstufft Worked fine. There's a definite improvement in speed but second install of django is still far from instant... (57 sec with cache, 66 sec cold) All the time is in the actual install, not the download though, so that's not relevant. So yeah, looks good to me. |
Err, yea sorry I meant the download itself should be nearly instant, sorry for not being clear. Thanks a lot for testing that out for me! |
|
||
Like all options, :ref:`--download-cache <install_--download-cache>`, can also | ||
be set as an environment variable, or placed into the pip config file. See the | ||
:ref:`Configuration` section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's always been common for people to be surprised that pip still crawls PyPI for packages, even after they are fully cached to fulfill their requirements. paragraphs 2-4 explains that, and explains how to achieve a local-only lookup. we should keep that portion and apply it to docs on the new general cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it doesn't really explain how to achieve local lookup, it just directs you to the cookbook area which does explain how to achieve local lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point on the local lookup bit, however I'm not sure where this would go right now. At the moment we don't have any specific docs on the cache other than what is generated automatically. The idea is that this cache should just work and people shouldn't generally need to mess with it or even know about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it disabuses people of the notion that the download-cache was a cache by package name, and then provides the link on how to achieve the speedy install they really wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but there is no more download cache with this PR, it's just a generic HTTP cache now like a browser has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cache has expired, it'll just do a conditional GET instead of a full GET.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no more "download cache", but there's still a url-based cache, that walks PyPI regardless of how full your cache is, right? that's what I want people to still understand somewhere in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might walk PyPI it might not touch the network at all, it depends.
It uses the same cache semantics as the browser. If PyPI has a CacheControl: max-age: 600
header in the response to an url, then for the next 600 seconds requesting that same URL again will not touch PyPI. After 600 seconds has gone by the item is still in the cache, but instead of just using the item from the cache unconditionally it will make a conditional GET, which will hit PyPI and say "I want this URL, but only if it doesn't match this ETag". PyPI will look what ETag they sent, and either return a 304 with an empty response if the ETags match, or it'll send a 200 with the content like normal. If we get a 304 response then we'll just refresh the timer on the cache for another 600 seconds (or whatever the max-age is on that response) and return the data we have stored in the cache.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added documentation about the Caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks
Ok, so @qwcode was worried about a server misconfiguration and causing us to cache it for an unreasonably long time. I've been thinking about this and looking at options and I realized we can specify a maximum time we're willing to accept a cache for by adding a session.get(url, headers={"CacheControl": "max-age=60"}) If we want to reject any cached items older than 60 (for example) even if the server says we can. With that in mind here's what I'm thinking:
|
@dstufft ok, nice, that sounds good to me. |
Ok, I went with 10 minutes instead of 5 for I'm going to adjust PyPI and then merge this later tonight. |
* Deprecates the --download-cache option & removes the download cache code. * Removes the in memory page cache on the index * Uses CacheControl to cache all cacheable HTTP requests to the filesystem. * Properly handles CacheControl headers for unconditional caching. * Will use ETag and Last-Modified headers to attempt to do a conditional HTTP request to speed up cache misses and turn them into cache hits. * Removes some concurrency unsafe code in the download cache accesses. * Uses a Cache-Control request header to limit the maximum length of time a cache is valid for. * Adds pip.appdirs to handle platform specific application directories such as cache, config, data, etc.
Okies, stuff got deployed. |
Use CacheControl instead of custom cache code
Closes #1732
Closes #1734
Closes #1634
Closes #1673
Closes #1141
Closes #1287
Closes #685