Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect cache #2095

Merged
merged 2 commits into from
Jun 12, 2014
Merged

Redirect cache #2095

merged 2 commits into from
Jun 12, 2014

Conversation

ericfrederich
Copy link
Contributor

When web clients like Firefox get a redirect they will not go after the old version any more even if you type it into the address bar.
This patch makes the requests framework work in the same way.
This helps performance when creating a request proxy that delegates to other servers based on URL.
This patch has me going from 245 requests per second to 470 requests per second against a server returning 308 response codes.

-- example server
https://gist.github.com/ericfrederich/a004862e1da1fb6916ef
-- example client
https://gist.github.com/ericfrederich/e1225e2d07e3ee923b27
-- example server output of 301 vs 302 redireects
https://gist.github.com/ericfrederich/b77bd4852a3cf9b968f0

@Lukasa
Copy link
Member

Lukasa commented Jun 12, 2014

Thanks for this!

I have mixed feelings. In general, this seems really helpful, but it also seems like it could introduce difficult-to-trace bugs when working with unhelpful upstream websites. Altogether I'm tempted to err in favour of the change though.

It'll need a unit test though: mind writing one?

In the meantime, @kennethreitz, you interested in this change?

@sigmavirus24
Copy link
Contributor

I'm 👎 on this change if only because I can imagine consumers like CloudFlare and Runscope relying on the fact that we follow the redirects each time. Sure this provides a speed-up and performance enhancement but I'm not convinced this won't make some people's lives much harder and less pleasant. Before we move on we should at least gather a lot more feedback from the people who rely so heavily on requests.

@dolph, @johnsheehan, @bryanhelmig, @dstufft, @mtourne, @dknecht can any of you weigh in or have someone more appropriate way in please?

@dstufft
Copy link
Contributor

dstufft commented Jun 12, 2014

Seems like something that is better left to CacheControl or whatever. Seems odd to have requests suddenly start handling caching but only in specific circumstances.

@Lukasa
Copy link
Member

Lukasa commented Jun 12, 2014

@sigmavirus24 @dstufft Both good points that I hadn't considered. I'm moving back down to -0, waiting to hear from some of the other big guns. =)

@dolph
Copy link

dolph commented Jun 12, 2014

+1: I'd expect session objects to cache the results of permanent redirects.

@johnsheehan
Copy link

I'm also down on this change. Not because of how it would work with our service, but because I think it is better handled in an adapter so that it's an explicit opt in.

@dknecht
Copy link

dknecht commented Jun 12, 2014

We (CloudFlare) would agree with John. Seems like something that should be put in an adapter that people can opt into.

On Jun 12, 2014, at 11:19 AM, John Sheehan notifications@github.com wrote:

I'm also down on this change. Not because of how it would work with our service, but because I think it is better handled in an adapter so that it's an explicit opt in.


Reply to this email directly or view it on GitHub.

@ericfrederich
Copy link
Contributor Author

It looks like there is already a session option for "allow_redirects".
This looks like it controls whether to attach a "history" attribute to the response (not actually control whether resolve_redirects gets called or not).
Could this cache be an option to the session rather than an adapter?

I'm not opposed to an adapter (though I would need help with the approach), but again, this is how both chrome and firefox operate with permanent redirects; they don't make requests up the whole chain each time. That is why I thought the cache should be a first class citizen, even if it is made to be an opt-in.

@Lukasa
Copy link
Member

Lukasa commented Jun 12, 2014

@ericfrederich It does control whether we follow redirects. resolve_redirects returns a generator, so if allow_redirects is false that generator doesn't get consumed and no redirects get followed.

The key thing to note is that requests is not, in fact, a browser, and there are times users are going to want more control. With that said, the bigger problem is that we're strongly resistant to adding more fields to the Session. That's why I wanted @kennethreitz's insight, just to see if he thinks this is a reasonable exception.

@dolph
Copy link

dolph commented Jun 12, 2014

requests is obviously not a "browser," but it is an HTTP client, and HTTP/1.1 10.3.2 reads:

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs.

If you're opposed to requests following recommendations in the HTTP specification by default, please justify why the recommendation should not guide the default behavior of requests.

@ericfrederich: caching is a first class citizen in HTTP, so I agree it should be a first class citizen in a requests session.

@Lukasa
Copy link
Member

Lukasa commented Jun 12, 2014

@dolph We've had this discussion many times. The specifications apply to user agents, and I've argued that requests by itself is not a whole user agent. There are some actions we delegate to the user of requests.

Now, to be clear, I have not taken a strong position on this one way or another, aside from to suggest that we should err on the side of back-compatibility and respect the requests feature freeze. At this stage I'm keeping an open mind as to both arguments.

Note, finally, that the normative language in that section is SHOULD, not MUST. =)

@ericfrederich
Copy link
Contributor Author

Interesting @dolph .
@Lukasa , I caught the "SHOULD" as well, but @dolph did state it is only recommendation.

It seems this comes down to whether requests is just a http library or a http client.
Perhaps a http library shouldn't do caching but a http client should?
So, the question of the day: is a requests.Session object a client?... or is it just some percentage of a client?

@dolph
Copy link

dolph commented Jun 12, 2014

@Lukasa My apologies for not being aware of prior discussions. Is there a documented guideline on where requests draws the line between the actions delegated to users of the library and requests' own responsibilities?

@Lukasa
Copy link
Member

Lukasa commented Jun 12, 2014

@dolph No need to apologise. =) Just wanted to make you aware of the context.

The short answer is no, we tend to do it by feel. Makes deciding these issues interesting. =P

@kennethreitz
Copy link
Contributor

Fascinating. I'll dwell on it.

@Lukasa
Copy link
Member

Lukasa commented Jun 12, 2014

Hooray! =) The tiebreaker arrives.

kennethreitz added a commit that referenced this pull request Jun 12, 2014
@kennethreitz kennethreitz merged commit 6b21e5c into psf:master Jun 12, 2014
@kennethreitz
Copy link
Contributor

✨ 🍰 ✨

@ericfrederich ericfrederich deleted the redirect_cache branch June 12, 2014 19:15
@johnsheehan
Copy link

This is a bummer. I just realized it will have far more impact on our use of Requests than I initially thought, but I realize we're a special case. Unfortunately, without an explicit opt out from this behavior this change will break how our test runner works so we'll probably have to find another solution besides Requests. I understand why people want this change, but not making it configurable seems like an oversight to me. I know our customers will struggle with it (we see them fight with similar client cleverness all the time).

@ericfrederich
Copy link
Contributor Author

@johnsheehan you're ready to jump ship already?
I hope your solution isn't going after the head of some github repo.
I'm all for having an opt-in / opt-out option.
You could do it yourself or I might be able to get to it sometime.
I just contributed 2 patches out of the blue while experimenting with a new design.

@sigmavirus24
Copy link
Contributor

@johnsheehan You can pin requests to 2.3.0 until we refine this feature.

@johnsheehan
Copy link

I'm not taking it out immediately as there are mitigations like pinning and this isn't shipped yet, but I've made a note to our team to plan accordingly since this would have a direct impact on our customers' expected product experience. My opinion is that this shouldn't have been merged without an explicit opt in/out but I respect the  decision of the maintainers recognizing that our situation is likely unique.On Thu, Jun 12, 2014 at 5:41 PM -0700, "Ian Cordasco" notifications@github.com wrote:

@johnsheehan You can pin requests to 2.3.0 until we refine this feature.

—Reply to this email directly or view it on GitHub.

@Lukasa
Copy link
Member

Lukasa commented Jun 13, 2014

@johnsheehan This is very easily disabled.

class NullDict(dict):
    def __setitem__(self, key, value):
        return

s = requests.Session()
s.redirect_cache = NullDict()

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants