Multiple Content-Length values that mismatch #59

annevk · 2018-04-20T13:38:02Z

The text that stipulates reject or replace at https://tools.ietf.org/html/rfc7230#section-3.3.2 should probably say that if the values differ than browsers should definitely reject. Only Chrome currently does so consistently per web-platform-tests/wpt#10548, but I'm going to try to get the others on board.

reschke · 2018-04-21T04:34:37Z

What about https://bugzilla.mozilla.org/show_bug.cgi?id=597706?

reschke · 2018-04-21T04:37:40Z

https://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.3.3 has:

If a message is received without Transfer-Encoding and with either multiple Content-Length header fields having differing field-values or a single Content-Length header field having an invalid value, then the message framing is invalid and the recipient MUST treat it as an unrecoverable error. ...

annevk · 2018-04-21T04:42:00Z

https://bugzilla.mozilla.org/show_bug.cgi?id=1455614 is the case I was concerned about.

reschke · 2018-04-21T04:44:11Z

I see. Well, RFC 7230 already requires rejecting it.

annevk · 2018-04-21T05:05:04Z

Yeah, I hadn't considered the requirements would span multiple sections.

Why does it say a single Content-Length header field having an invalid value and not one or more Content-Length header fields having an invalid value?

mnot · 2018-04-21T05:34:40Z

There's some fuzziness about terminology around the differences between:

a header's complete value
a header line
an individual value in a list-based header field

This could do with some improvement.

wtarreau · 2018-04-21T06:17:12Z

The fuzziness often comes from header folding, because most parsers will only look at one occurrence of the header and will not expect to find commas in it. Those running "atoi()" will parse "21345,1789" as "21345". But I remember we added these stricter requirements a bit late in the document redesign and now think we should really consider that any header field may have experienced undesired folding and contain funny stuff. This could allow us to avoid repeating ourselves along all the text of each sensitive header field.

wtarreau · 2018-04-21T06:21:33Z

In fact I'm thinking that as crazy as it could seem, we should probably define Content-Length as a list. It will encourage implementations in checking all possible values around commas instead of getting trapped. That's what haproxy does by the way.

annevk · 2018-04-21T07:04:21Z

whatwg/mimesniff#30 (comment) might be of interest. I was thinking of encouraging eager combining and then parsing for all headers, but @MattMenke2 pushed back on that a bit. An alternative would be to only do that for new headers (as @mnot also advocates in structured headers) and a couple of existing ones for which the difference cannot be observed (such as Content-Length), and then have a couple of legacy exceptions (along with Set-Cookie) for which we do make a distinction between multiple headers and a header with multiple values.

wtarreau · 2018-04-21T08:11:28Z

Making special cases is always problematic because there are intermediaries who apply the same rule everywhere without knowing about the header which is processed. I remember that Apache always folds headers except set-cookie. It's permitted because multiple occurrences may only appear if the header is defined as a comma-delimited list. Haproxy is able to extract header values always using the comma as a delimiter, though it can also extract them as full-line value (eg: for date/user-agent etc).
Such components cannot know which ones are supposed to be handled differently because they only transport something which is implicitly agreed between a client and a server, and they must be extremely careful not to molest the messages. So in practice they rely on the sender to detect if some folding operations are permitted.

Probably we need to refine the parsing algorithm. Maybe it could be said that folding is only permitted when there is no unmatched quotes for example, and that recipients must look for multiple headers occurrences when processing a value even if it's not defined as a list. This would not fix what's already being done in field but could improve the situation.

Duplicate headers are something we constantly have to deal with. These ones can come from caching components which accidently add a value to the one already in the cache, and from application frameworks which unconditionally add headers on top of what the application already did. That's why maybe in the end we should declare that any header is a list, except those already containing commas (how many are they?) which must not be folded and should be handled like set-cookie.

reschke · 2018-04-21T08:15:24Z

@wtarreau - now that sounds like a breaking change. Header folding has been allowed since HTTP/1.0, so I don't see how you can disallow it now.

wtarreau · 2018-04-21T08:39:08Z

I'm not suggesting to disallow it. I'm saying that it has always been said that only headers defined as a comma-delimited list may be folded, and thus by definition those containing commas in the value not being defined as a list cannot be folded. We already know a number of them, thus we can probably state that this number of headers must not be folded, so as to ease processing on the recipient. For example if a cache adds an extra Expires header, I'd rather see it on its own line than folded with the previous one. That doesn't mean it will not happen anymore, it only means we'll be able to more gracefully handle these occasional duplicates when they happen than what we currently do.

For the unmatched quote it could probably be roughly similar. Right now folding such headers results in random behaviour that is better dealt with without folding. By stating that an intermediary should not fold them, we increase the chances that the recipient properly parses its contents and we don't break existing implementations since folding has never been a requirement.

reschke · 2018-04-21T08:52:50Z

Well.

"A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below)."

So if there are multiple instances of a header field that does not use list syntax, the message is already malformed. So a component that always does folding is on the safe side, spec-wise (garbage in, garbage out).

We really need to distinguish between what we can say about the core protocol, advice for people defining new header fields, and advice for implementations.

wtarreau · 2018-04-21T09:44:18Z

I know that some such messages are already malformed, but this is what we have to deal with from time to time (such as the duplicate C-L). My point is that issues raise from the "it never happens" claim, which I'd hope we can turn into "it must never happen, but if it happens here's the safest way to deal with it without risking your software being accused of the trouble since it used to work before you plugged it".

One way to do this could be to encourage limiting the automatic folding. Because the "garbage in garbage out" mode currently is worse, it's more "manageable garbage in, total garbage out" sometimes due to commas or quotes in some header values.

Regarding the distinction between what to say for these 3 separate worlds, I tend to agree with you. But many of us have been hit a few times due to real world match implementations more than the core protocol (e.g. "the content-length value" when you end up with several of them). I tend to think that the core protocol could (and I'm saying could and not should) try to address these real world issues in an elegant and reasonably safe way. For instance, by stating that whenever a header field is mentioned in the spec and assumed to be unique, refer to section X to see how to deal with non-uniqueness of the occurrence or the value (or possibly must have been made unique before processing). It's also critical that we never put a single MUST nor SHOULD there or it will encourage bad players to continue to be bad by relying on recipients to fix their crap. But a couple of MAY could definitely improve interoperability.

royfielding · 2020-02-02T09:36:21Z

https://httpwg.org/http-core/draft-ietf-httpbis-messaging-latest.html#message.body.length

bullet (4)

mnot · 2020-02-02T09:38:03Z

Waiting for resolution of #111.

mnot · 2020-02-02T09:41:15Z

Discussed in Basel; intent is to make sure that Content-Length: 4,4 and those values on two header lines are handled in the same way.

... to account for terminology changes, and to clarify that it doesn't matter whether the values are on different lines. Fixes #59.

annevk mentioned this issue Apr 20, 2018

Preventing some CRLF header injection attacks whatwg/fetch#375

Open

mnot added h1-messaging semantics labels Jun 29, 2018

mnot mentioned this issue Jun 29, 2018

Header terminology #111

Closed

annevk mentioned this issue Jan 22, 2019

Audit: single value header field error handling #193

Closed

25 tasks

mnot added the waiting label Feb 2, 2020

mnot removed the waiting label Feb 3, 2020

mnot self-assigned this Mar 19, 2020

mnot added a commit that referenced this issue Mar 20, 2020

Adjust multiple Content-Length handling

876d53a

... to account for terminology changes, and to clarify that it doesn't matter whether the values are on different lines. Fixes #59.

mnot mentioned this issue Mar 20, 2020

Adjust multiple Content-Length handling #343

Merged

mnot added the has-proposal label Mar 20, 2020

mnot closed this as completed in #343 Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Content-Length values that mismatch #59

Multiple Content-Length values that mismatch #59

annevk commented Apr 20, 2018

reschke commented Apr 21, 2018

reschke commented Apr 21, 2018

annevk commented Apr 21, 2018

reschke commented Apr 21, 2018 •

edited

Loading

annevk commented Apr 21, 2018

mnot commented Apr 21, 2018

wtarreau commented Apr 21, 2018

wtarreau commented Apr 21, 2018

annevk commented Apr 21, 2018

wtarreau commented Apr 21, 2018

reschke commented Apr 21, 2018

wtarreau commented Apr 21, 2018

reschke commented Apr 21, 2018

wtarreau commented Apr 21, 2018

royfielding commented Feb 2, 2020

mnot commented Feb 2, 2020

mnot commented Feb 2, 2020

Multiple Content-Length values that mismatch #59

Multiple Content-Length values that mismatch #59

Comments

annevk commented Apr 20, 2018

reschke commented Apr 21, 2018

reschke commented Apr 21, 2018

annevk commented Apr 21, 2018

reschke commented Apr 21, 2018 • edited Loading

annevk commented Apr 21, 2018

mnot commented Apr 21, 2018

wtarreau commented Apr 21, 2018

wtarreau commented Apr 21, 2018

annevk commented Apr 21, 2018

wtarreau commented Apr 21, 2018

reschke commented Apr 21, 2018

wtarreau commented Apr 21, 2018

reschke commented Apr 21, 2018

wtarreau commented Apr 21, 2018

royfielding commented Feb 2, 2020

mnot commented Feb 2, 2020

mnot commented Feb 2, 2020

reschke commented Apr 21, 2018 •

edited

Loading