Parsing Content-Length #10548

annevk · 2018-04-20T13:24:32Z

No description provided.

annevk · 2018-04-20T13:53:26Z

@MattMenke2 @mnot I suspect you have opinions/thoughts here (on the bits that do work properly).

MattMenke2 · 2018-04-20T14:13:42Z

My impression is that browsers have been historically reluctant to hard-fail requests when it sees weird responses, as other browsers don't, either, so any browser that does will probably get bug reports about being the only one that breaks broken sites. Many of Chrome's old weird behaviors at the net layer are based on FireFox's corresponding weird behaviors, so it's not unexpected that they behave the same, except when one or the other has made a conscious effort to be less weird. I'm not blaming FireFox for this, just explaining why they behave so similarly.

Chrome fails requests when the response has multiple Content-Length headers with different values, but then proceeds to ignore Content-Length headers with invalid values.

So these will work:
Content-Length: 1
Content-Length: 1
(Experimentally, a fair number of servers and/or proxies seem to do this)

Content-Length: Happy potato
Content-Length: Happy potato
(Treated as having no Content-Length field)

But these will fail (And close the connection):
Content-Length: 1
Content-Length: 2

Content-Length: Happy potato
Content-Length: Grapefruit

Since Chrome started rejecting mismatching content-lengths 4 or 5 years ago, the already small number of servers that did this has dropped precipitously, though we did get a number of bugs filed when I implemented that behavior.

I'm pretty sure both of the following would be interpreted by Chrome as having no content length:
Content-Length: 1 1

Content-Length: -1

MattMenke2 · 2018-04-20T16:16:42Z

So back to the actual question of what we should be doing: I'd be fine with tightening up handling here - rejecting responses with invalid Content-Lengths entirely. I don't have numbers on how common this is, but it's a small enough change that I'd be willing to push on trying and hoping for the best, rather than histograming, waiting months for numbers, and then disabling.

I'd be more concerned about the breakage from not allowing multiple identical Content-Length headers, just because I suspect there would be a lot more fallout from doing that, and I'm not sure that a simpler spec is worth the fallout, though if other browser vendors show interest in tightening up there, Chrome should certainly follow suit.

I assume we should we continue to ignore bad Content-Length values for chunked responses? I believe the spec says to ignore the header in that case.

annevk · 2018-04-20T16:24:02Z

I think multiple with the same value is okay, as long as it's consistent (filed https://bugzilla.mozilla.org/show_bug.cgi?id=1455614 on Firefox not being consistent).

If we could reject invalid values that'd be great, but I would also be happy to specify ignoring and move on. It seems good to finally write this down somewhere so new engines don't have to find things out the hard way (and we all get to refactor).

MattMenke2 · 2018-04-20T16:54:16Z

I can live with maintaining current behavior, though I think it has potential for weirdness. It certainly seems closer to what browsers do, which makes migration less effort, and means fewer broken sites.

That having been said, Chrome's current behavior of allowing invalid values, but only if all other content-length headers have the same invalid value, seems a bit unexpected to me, and I'm not sure we'd want to standardize that, though the only other option that occurs to me (Without rejecting responses with bad values) is just ignoring all invalid values and only comparing valid ones, which wouldn't be hard to do, just not sure it's any better.

I also suspect it means all browsers will end up treating a Content-Length that doesn't fit in a 64-bit integer the same as an invalid value (i.e., silently ignoring it, since it fails the int64 parser). While a file that long will lead to other problems, anyways, and this is a bit of an extreme corner case, ignoring the Content-Length in that case seems weird as well. Admittedly, it's possible to differentiate this from other Content-Length values browsers don't like, I just don't think browsers are generally going to be that careful about it.

annevk · 2018-04-20T17:05:16Z

We treat limits a bit differently in standards. Sometimes they are standardized, but often we just leave them open for implementations to compete on. Here it seems reasonable to not have a strict limit, though perhaps a lower bound should at some point be set somewhere (probably better for that to be in HTTP proper though, if they don't want to adopt all of this).

MattMenke2 · 2018-04-20T17:08:44Z

I'm not suggesting some limit be standardized, just saying that accepting but ignoring invalid values is likely to result in more sloppiness around values that are too large in a way that I think is sub-optimal.

annevk · 2018-04-20T17:49:57Z

I see, sorry. Well, if we can push to be stricter that does seem better. If that doesn't work, we can always make sure to test the scenarios you mention and ensure browsers implement the desired result. (That's part of the reason why I set up this small testing framework.)

annevk · 2018-04-23T09:20:45Z

Six years ago Firefox encountered a problem with a router due to rejecting -1 by the way: https://bugzilla.mozilla.org/show_bug.cgi?id=704227.

jugglinmike · 2018-04-25T18:10:40Z

This looks good, although we could expand coverage to highlight another point of contention:

   {
     "input": "Content-Length: 42,30",
     "output": null
   },
+  {
+    "input": "Content-Length: 30,42",
+    "output": null
+  },

Chrome passes this while Firefox reports "30". They both reject

Content-Length: 30
Content-Length: 42

but it might be good to add that for symmetry.

fetch/content-length/resources/content-lengths.json

mnot · 2018-05-01T06:16:31Z

The test cases look reasonable, but I wonder if the number of bytes exposed as the response is always going to be what the browser parses 'content-length' as...

annevk · 2018-05-01T06:29:58Z

What kind of thing do you have in mind?

mnot · 2018-05-01T07:23:10Z

IME browsers often read more off the wire than C-L says.

annevk · 2018-05-01T07:55:14Z

Do you have a test in mind other than the one I submitted already (Content-Length: 30 with a 42 byte body expecting 30 bytes)?

fetch/content-length/resources/content-lengths.json

MattMenke2 · 2018-05-02T16:16:18Z

Sorry for the delay. The tests look good to me, a few other suggestions:

Content-Length: 0x20
Content-Length: 030
Content-Length: 030, 30 (If the above one should be 30, this is relevant. If the above one is 42, this is much less interesting - I assume it should be 30)

annevk · 2021-02-26T16:37:30Z

@MattMenke2 I forgot why I dropped the ball on this. Probably in part to wait and see if HTTP would define this. It seems that meanwhile Chrome fixed all these tests! However, it does still fail your suggested tests with quotes that I had not got around to adding just yet.

If you could take another look that would be great. I hope that we can define obtaining the value of this header in Fetch, largely by deferring to HTTP now, and then use that operation in XMLHttpRequest for information needed by download progress events.

fetch/content-length/resources/content-lengths.json

fetch/content-length/parsing.window.js

fetch/content-length/resources/content-lengths.json

annevk · 2021-03-01T15:54:28Z

I also created #27837 to test this a bit with synthetic responses and to see how XMLHttpRequest uses Content-Length for its events.

As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156. Tests: web-platform-tests/wpt#10548 & web-platform-tests/wpt#27837.

annevk · 2021-03-01T16:56:46Z

Fetch side of this is now up at whatwg/fetch#1183 and whatwg/fetch#1184. I'll prolly do the XHR side tomorrow. Review of all these things is appreciated, even if it's just high-level.

MattMenke2 · 2021-03-01T17:43:12Z

Still looks good

As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156. Tests: web-platform-tests/wpt#10548.

ghost

No spig

wpt-pr-bot added the fetch label Apr 20, 2018

wpt-pr-bot requested review from jdm, mnot, youennf and yutakahirano April 20, 2018 13:24

annevk mentioned this pull request Apr 20, 2018

Multiple Content-Length values that mismatch httpwg/http-core#59

Closed

This comment has been minimized.

Sign in to view

annevk mentioned this pull request Apr 20, 2018

Preventing some CRLF header injection attacks whatwg/fetch#375

Open

This comment has been minimized.

Sign in to view

mnot reviewed May 1, 2018

View reviewed changes

fetch/content-length/resources/content-lengths.json Show resolved Hide resolved

MattMenke2 reviewed May 2, 2018

View reviewed changes

fetch/content-length/resources/content-lengths.json Show resolved Hide resolved

annevk added 5 commits February 26, 2021 17:19

Parsing Content-Length

66d5a0a

correct script and add more tests

48c084d

more tests

3ca3837

test more with 30

4837ae3

no hex or oct digits

36c7bd6

annevk force-pushed the annevk/content-length branch from 858dccf to 36c7bd6 Compare February 26, 2021 16:21

wpt-pr-bot assigned MattMenke2 Feb 26, 2021

github-actions bot temporarily deployed to wpt-preview-10548 February 26, 2021 16:21 Inactive

add two more tests

cb1bb1d

github-actions bot temporarily deployed to wpt-preview-10548 February 26, 2021 16:33 Inactive

annevk requested review from mnot and MattMenke2 February 26, 2021 16:37

MattMenke2 reviewed Feb 26, 2021

View reviewed changes

fetch/content-length/resources/content-lengths.json Outdated Show resolved Hide resolved

single header bogus value is ignored

9521560

github-actions bot temporarily deployed to wpt-preview-10548 February 26, 2021 17:18 Inactive

MattMenke2 reviewed Feb 26, 2021

View reviewed changes

fetch/content-length/parsing.window.js Outdated Show resolved Hide resolved

fetch/content-length/resources/content-lengths.json Show resolved Hide resolved

more tests \o/

2b763a0

github-actions bot deployed to wpt-preview-10548 March 1, 2021 15:59 View deployment

This was referenced Mar 1, 2021

Define Content-Length parser whatwg/fetch#1183

Merged

Content codings (Content-Encoding) whatwg/fetch#716

Open

annevk removed the do not merge yet label Mar 1, 2021

annevk merged commit 3a48a22 into master Mar 2, 2021

annevk deleted the annevk/content-length branch March 2, 2021 14:58

ghost reviewed Jul 18, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing Content-Length #10548

Parsing Content-Length #10548

annevk commented Apr 20, 2018 •

edited by wpt-pr-bot

Loading

This comment has been minimized.

annevk commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

annevk commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

annevk commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

annevk commented Apr 20, 2018

annevk commented Apr 23, 2018

jugglinmike commented Apr 25, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

mnot commented May 1, 2018

annevk commented May 1, 2018

mnot commented May 1, 2018

annevk commented May 1, 2018

MattMenke2 commented May 2, 2018

annevk commented Feb 26, 2021

annevk commented Mar 1, 2021

annevk commented Mar 1, 2021

MattMenke2 commented Mar 1, 2021

ghost left a comment

Parsing Content-Length #10548

Parsing Content-Length #10548

Conversation

annevk commented Apr 20, 2018 • edited by wpt-pr-bot Loading

This comment has been minimized.

annevk commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

annevk commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

annevk commented Apr 20, 2018

MattMenke2 commented Apr 20, 2018

annevk commented Apr 20, 2018

annevk commented Apr 23, 2018

jugglinmike commented Apr 25, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

mnot commented May 1, 2018

annevk commented May 1, 2018

mnot commented May 1, 2018

annevk commented May 1, 2018

MattMenke2 commented May 2, 2018

annevk commented Feb 26, 2021

annevk commented Mar 1, 2021

annevk commented Mar 1, 2021

MattMenke2 commented Mar 1, 2021

ghost left a comment

Choose a reason for hiding this comment

annevk commented Apr 20, 2018 •

edited by wpt-pr-bot

Loading