-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing Content-Length #10548
Parsing Content-Length #10548
Conversation
This comment has been minimized.
This comment has been minimized.
@MattMenke2 @mnot I suspect you have opinions/thoughts here (on the bits that do work properly). |
My impression is that browsers have been historically reluctant to hard-fail requests when it sees weird responses, as other browsers don't, either, so any browser that does will probably get bug reports about being the only one that breaks broken sites. Many of Chrome's old weird behaviors at the net layer are based on FireFox's corresponding weird behaviors, so it's not unexpected that they behave the same, except when one or the other has made a conscious effort to be less weird. I'm not blaming FireFox for this, just explaining why they behave so similarly. Chrome fails requests when the response has multiple Content-Length headers with different values, but then proceeds to ignore Content-Length headers with invalid values. So these will work: Content-Length: Happy potato But these will fail (And close the connection): Content-Length: Happy potato Since Chrome started rejecting mismatching content-lengths 4 or 5 years ago, the already small number of servers that did this has dropped precipitously, though we did get a number of bugs filed when I implemented that behavior. I'm pretty sure both of the following would be interpreted by Chrome as having no content length: Content-Length: -1 |
So back to the actual question of what we should be doing: I'd be fine with tightening up handling here - rejecting responses with invalid Content-Lengths entirely. I don't have numbers on how common this is, but it's a small enough change that I'd be willing to push on trying and hoping for the best, rather than histograming, waiting months for numbers, and then disabling. I'd be more concerned about the breakage from not allowing multiple identical Content-Length headers, just because I suspect there would be a lot more fallout from doing that, and I'm not sure that a simpler spec is worth the fallout, though if other browser vendors show interest in tightening up there, Chrome should certainly follow suit. I assume we should we continue to ignore bad Content-Length values for chunked responses? I believe the spec says to ignore the header in that case. |
I think multiple with the same value is okay, as long as it's consistent (filed https://bugzilla.mozilla.org/show_bug.cgi?id=1455614 on Firefox not being consistent). If we could reject invalid values that'd be great, but I would also be happy to specify ignoring and move on. It seems good to finally write this down somewhere so new engines don't have to find things out the hard way (and we all get to refactor). |
I can live with maintaining current behavior, though I think it has potential for weirdness. It certainly seems closer to what browsers do, which makes migration less effort, and means fewer broken sites. That having been said, Chrome's current behavior of allowing invalid values, but only if all other content-length headers have the same invalid value, seems a bit unexpected to me, and I'm not sure we'd want to standardize that, though the only other option that occurs to me (Without rejecting responses with bad values) is just ignoring all invalid values and only comparing valid ones, which wouldn't be hard to do, just not sure it's any better. I also suspect it means all browsers will end up treating a Content-Length that doesn't fit in a 64-bit integer the same as an invalid value (i.e., silently ignoring it, since it fails the int64 parser). While a file that long will lead to other problems, anyways, and this is a bit of an extreme corner case, ignoring the Content-Length in that case seems weird as well. Admittedly, it's possible to differentiate this from other Content-Length values browsers don't like, I just don't think browsers are generally going to be that careful about it. |
We treat limits a bit differently in standards. Sometimes they are standardized, but often we just leave them open for implementations to compete on. Here it seems reasonable to not have a strict limit, though perhaps a lower bound should at some point be set somewhere (probably better for that to be in HTTP proper though, if they don't want to adopt all of this). |
I'm not suggesting some limit be standardized, just saying that accepting but ignoring invalid values is likely to result in more sloppiness around values that are too large in a way that I think is sub-optimal. |
I see, sorry. Well, if we can push to be stricter that does seem better. If that doesn't work, we can always make sure to test the scenarios you mention and ensure browsers implement the desired result. (That's part of the reason why I set up this small testing framework.) |
Six years ago Firefox encountered a problem with a router due to rejecting -1 by the way: https://bugzilla.mozilla.org/show_bug.cgi?id=704227. |
This looks good, although we could expand coverage to highlight another point of contention: {
"input": "Content-Length: 42,30",
"output": null
},
+ {
+ "input": "Content-Length: 30,42",
+ "output": null
+ }, Chrome passes this while Firefox reports "30". They both reject
but it might be good to add that for symmetry. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The test cases look reasonable, but I wonder if the number of bytes exposed as the response is always going to be what the browser parses 'content-length' as... |
What kind of thing do you have in mind? |
IME browsers often read more off the wire than C-L says. |
Do you have a test in mind other than the one I submitted already ( |
Sorry for the delay. The tests look good to me, a few other suggestions: Content-Length: 0x20 |
858dccf
to
36c7bd6
Compare
@MattMenke2 I forgot why I dropped the ball on this. Probably in part to wait and see if HTTP would define this. It seems that meanwhile Chrome fixed all these tests! However, it does still fail your suggested tests with quotes that I had not got around to adding just yet. If you could take another look that would be great. I hope that we can define obtaining the value of this header in Fetch, largely by deferring to HTTP now, and then use that operation in XMLHttpRequest for information needed by download progress events. |
I also created #27837 to test this a bit with synthetic responses and to see how XMLHttpRequest uses |
As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156. Tests: web-platform-tests/wpt#10548 & web-platform-tests/wpt#27837.
Fetch side of this is now up at whatwg/fetch#1183 and whatwg/fetch#1184. I'll prolly do the XHR side tomorrow. Review of all these things is appreciated, even if it's just high-level. |
Still looks good |
As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156. Tests: web-platform-tests/wpt#10548.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No spig
No description provided.