Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Content-Length #10548

Merged
merged 8 commits into from
Mar 2, 2021
Merged

Parsing Content-Length #10548

merged 8 commits into from
Mar 2, 2021

Conversation

annevk
Copy link
Member

@annevk annevk commented Apr 20, 2018

No description provided.

@annevk

This comment has been minimized.

@annevk
Copy link
Member Author

annevk commented Apr 20, 2018

@MattMenke2 @mnot I suspect you have opinions/thoughts here (on the bits that do work properly).

@MattMenke2
Copy link
Member

My impression is that browsers have been historically reluctant to hard-fail requests when it sees weird responses, as other browsers don't, either, so any browser that does will probably get bug reports about being the only one that breaks broken sites. Many of Chrome's old weird behaviors at the net layer are based on FireFox's corresponding weird behaviors, so it's not unexpected that they behave the same, except when one or the other has made a conscious effort to be less weird. I'm not blaming FireFox for this, just explaining why they behave so similarly.

Chrome fails requests when the response has multiple Content-Length headers with different values, but then proceeds to ignore Content-Length headers with invalid values.

So these will work:
Content-Length: 1
Content-Length: 1
(Experimentally, a fair number of servers and/or proxies seem to do this)

Content-Length: Happy potato
Content-Length: Happy potato
(Treated as having no Content-Length field)

But these will fail (And close the connection):
Content-Length: 1
Content-Length: 2

Content-Length: Happy potato
Content-Length: Grapefruit

Since Chrome started rejecting mismatching content-lengths 4 or 5 years ago, the already small number of servers that did this has dropped precipitously, though we did get a number of bugs filed when I implemented that behavior.

I'm pretty sure both of the following would be interpreted by Chrome as having no content length:
Content-Length: 1 1

Content-Length: -1

@MattMenke2
Copy link
Member

So back to the actual question of what we should be doing: I'd be fine with tightening up handling here - rejecting responses with invalid Content-Lengths entirely. I don't have numbers on how common this is, but it's a small enough change that I'd be willing to push on trying and hoping for the best, rather than histograming, waiting months for numbers, and then disabling.

I'd be more concerned about the breakage from not allowing multiple identical Content-Length headers, just because I suspect there would be a lot more fallout from doing that, and I'm not sure that a simpler spec is worth the fallout, though if other browser vendors show interest in tightening up there, Chrome should certainly follow suit.

I assume we should we continue to ignore bad Content-Length values for chunked responses? I believe the spec says to ignore the header in that case.

@annevk
Copy link
Member Author

annevk commented Apr 20, 2018

I think multiple with the same value is okay, as long as it's consistent (filed https://bugzilla.mozilla.org/show_bug.cgi?id=1455614 on Firefox not being consistent).

If we could reject invalid values that'd be great, but I would also be happy to specify ignoring and move on. It seems good to finally write this down somewhere so new engines don't have to find things out the hard way (and we all get to refactor).

@MattMenke2
Copy link
Member

I can live with maintaining current behavior, though I think it has potential for weirdness. It certainly seems closer to what browsers do, which makes migration less effort, and means fewer broken sites.

That having been said, Chrome's current behavior of allowing invalid values, but only if all other content-length headers have the same invalid value, seems a bit unexpected to me, and I'm not sure we'd want to standardize that, though the only other option that occurs to me (Without rejecting responses with bad values) is just ignoring all invalid values and only comparing valid ones, which wouldn't be hard to do, just not sure it's any better.

I also suspect it means all browsers will end up treating a Content-Length that doesn't fit in a 64-bit integer the same as an invalid value (i.e., silently ignoring it, since it fails the int64 parser). While a file that long will lead to other problems, anyways, and this is a bit of an extreme corner case, ignoring the Content-Length in that case seems weird as well. Admittedly, it's possible to differentiate this from other Content-Length values browsers don't like, I just don't think browsers are generally going to be that careful about it.

@annevk
Copy link
Member Author

annevk commented Apr 20, 2018

We treat limits a bit differently in standards. Sometimes they are standardized, but often we just leave them open for implementations to compete on. Here it seems reasonable to not have a strict limit, though perhaps a lower bound should at some point be set somewhere (probably better for that to be in HTTP proper though, if they don't want to adopt all of this).

@MattMenke2
Copy link
Member

I'm not suggesting some limit be standardized, just saying that accepting but ignoring invalid values is likely to result in more sloppiness around values that are too large in a way that I think is sub-optimal.

@annevk
Copy link
Member Author

annevk commented Apr 20, 2018

I see, sorry. Well, if we can push to be stricter that does seem better. If that doesn't work, we can always make sure to test the scenarios you mention and ensure browsers implement the desired result. (That's part of the reason why I set up this small testing framework.)

@annevk
Copy link
Member Author

annevk commented Apr 23, 2018

Six years ago Firefox encountered a problem with a router due to rejecting -1 by the way: https://bugzilla.mozilla.org/show_bug.cgi?id=704227.

@jugglinmike
Copy link
Contributor

This looks good, although we could expand coverage to highlight another point of contention:

   {
     "input": "Content-Length: 42,30",
     "output": null
   },
+  {
+    "input": "Content-Length: 30,42",
+    "output": null
+  },

Chrome passes this while Firefox reports "30". They both reject

Content-Length: 30
Content-Length: 42

but it might be good to add that for symmetry.

@annevk

This comment has been minimized.

@jugglinmike

This comment has been minimized.

@annevk

This comment has been minimized.

@jgraham

This comment has been minimized.

@annevk

This comment has been minimized.

@mnot
Copy link
Member

mnot commented May 1, 2018

The test cases look reasonable, but I wonder if the number of bytes exposed as the response is always going to be what the browser parses 'content-length' as...

@annevk
Copy link
Member Author

annevk commented May 1, 2018

What kind of thing do you have in mind?

@mnot
Copy link
Member

mnot commented May 1, 2018

IME browsers often read more off the wire than C-L says.

@annevk
Copy link
Member Author

annevk commented May 1, 2018

Do you have a test in mind other than the one I submitted already (Content-Length: 30 with a 42 byte body expecting 30 bytes)?

@MattMenke2
Copy link
Member

Sorry for the delay. The tests look good to me, a few other suggestions:

Content-Length: 0x20
Content-Length: 030
Content-Length: 030, 30 (If the above one should be 30, this is relevant. If the above one is 42, this is much less interesting - I assume it should be 30)

@annevk annevk force-pushed the annevk/content-length branch from 858dccf to 36c7bd6 Compare February 26, 2021 16:21
@github-actions github-actions bot temporarily deployed to wpt-preview-10548 February 26, 2021 16:21 Inactive
@github-actions github-actions bot temporarily deployed to wpt-preview-10548 February 26, 2021 16:33 Inactive
@annevk
Copy link
Member Author

annevk commented Feb 26, 2021

@MattMenke2 I forgot why I dropped the ball on this. Probably in part to wait and see if HTTP would define this. It seems that meanwhile Chrome fixed all these tests! However, it does still fail your suggested tests with quotes that I had not got around to adding just yet.

If you could take another look that would be great. I hope that we can define obtaining the value of this header in Fetch, largely by deferring to HTTP now, and then use that operation in XMLHttpRequest for information needed by download progress events.

@annevk annevk requested review from mnot and MattMenke2 February 26, 2021 16:37
@github-actions github-actions bot temporarily deployed to wpt-preview-10548 February 26, 2021 17:18 Inactive
@annevk
Copy link
Member Author

annevk commented Mar 1, 2021

I also created #27837 to test this a bit with synthetic responses and to see how XMLHttpRequest uses Content-Length for its events.

annevk added a commit to whatwg/fetch that referenced this pull request Mar 1, 2021
As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156.

Tests: web-platform-tests/wpt#10548 & web-platform-tests/wpt#27837.
@annevk
Copy link
Member Author

annevk commented Mar 1, 2021

Fetch side of this is now up at whatwg/fetch#1183 and whatwg/fetch#1184. I'll prolly do the XHR side tomorrow. Review of all these things is appreciated, even if it's just high-level.

@MattMenke2
Copy link
Member

Still looks good

@annevk annevk merged commit 3a48a22 into master Mar 2, 2021
@annevk annevk deleted the annevk/content-length branch March 2, 2021 14:58
annevk added a commit to whatwg/fetch that referenced this pull request Mar 3, 2021
As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156.

Tests: web-platform-tests/wpt#10548.
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No spig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants