Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Content-Length values that mismatch #59

Closed
annevk opened this issue Apr 20, 2018 · 17 comments · Fixed by #343
Closed

Multiple Content-Length values that mismatch #59

annevk opened this issue Apr 20, 2018 · 17 comments · Fixed by #343

Comments

@annevk
Copy link
Contributor

annevk commented Apr 20, 2018

The text that stipulates reject or replace at https://tools.ietf.org/html/rfc7230#section-3.3.2 should probably say that if the values differ than browsers should definitely reject. Only Chrome currently does so consistently per web-platform-tests/wpt#10548, but I'm going to try to get the others on board.

@reschke
Copy link
Contributor

reschke commented Apr 21, 2018

@reschke
Copy link
Contributor

reschke commented Apr 21, 2018

https://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.3.3 has:

  1. If a message is received without Transfer-Encoding and with either multiple Content-Length header fields having differing field-values or a single Content-Length header field having an invalid value, then the message framing is invalid and the recipient MUST treat it as an unrecoverable error. ...

@annevk
Copy link
Contributor Author

annevk commented Apr 21, 2018

https://bugzilla.mozilla.org/show_bug.cgi?id=1455614 is the case I was concerned about.

@reschke
Copy link
Contributor

reschke commented Apr 21, 2018

I see. Well, RFC 7230 already requires rejecting it.

@annevk
Copy link
Contributor Author

annevk commented Apr 21, 2018

Yeah, I hadn't considered the requirements would span multiple sections.

Why does it say a single Content-Length header field having an invalid value and not one or more Content-Length header fields having an invalid value?

@mnot
Copy link
Member

mnot commented Apr 21, 2018

There's some fuzziness about terminology around the differences between:

  • a header's complete value
  • a header line
  • an individual value in a list-based header field

This could do with some improvement.

@wtarreau
Copy link

The fuzziness often comes from header folding, because most parsers will only look at one occurrence of the header and will not expect to find commas in it. Those running "atoi()" will parse "21345,1789" as "21345". But I remember we added these stricter requirements a bit late in the document redesign and now think we should really consider that any header field may have experienced undesired folding and contain funny stuff. This could allow us to avoid repeating ourselves along all the text of each sensitive header field.

@wtarreau
Copy link

In fact I'm thinking that as crazy as it could seem, we should probably define Content-Length as a list. It will encourage implementations in checking all possible values around commas instead of getting trapped. That's what haproxy does by the way.

@annevk
Copy link
Contributor Author

annevk commented Apr 21, 2018

whatwg/mimesniff#30 (comment) might be of interest. I was thinking of encouraging eager combining and then parsing for all headers, but @MattMenke2 pushed back on that a bit. An alternative would be to only do that for new headers (as @mnot also advocates in structured headers) and a couple of existing ones for which the difference cannot be observed (such as Content-Length), and then have a couple of legacy exceptions (along with Set-Cookie) for which we do make a distinction between multiple headers and a header with multiple values.

@wtarreau
Copy link

Making special cases is always problematic because there are intermediaries who apply the same rule everywhere without knowing about the header which is processed. I remember that Apache always folds headers except set-cookie. It's permitted because multiple occurrences may only appear if the header is defined as a comma-delimited list. Haproxy is able to extract header values always using the comma as a delimiter, though it can also extract them as full-line value (eg: for date/user-agent etc).
Such components cannot know which ones are supposed to be handled differently because they only transport something which is implicitly agreed between a client and a server, and they must be extremely careful not to molest the messages. So in practice they rely on the sender to detect if some folding operations are permitted.

Probably we need to refine the parsing algorithm. Maybe it could be said that folding is only permitted when there is no unmatched quotes for example, and that recipients must look for multiple headers occurrences when processing a value even if it's not defined as a list. This would not fix what's already being done in field but could improve the situation.

Duplicate headers are something we constantly have to deal with. These ones can come from caching components which accidently add a value to the one already in the cache, and from application frameworks which unconditionally add headers on top of what the application already did. That's why maybe in the end we should declare that any header is a list, except those already containing commas (how many are they?) which must not be folded and should be handled like set-cookie.

@reschke
Copy link
Contributor

reschke commented Apr 21, 2018

@wtarreau - now that sounds like a breaking change. Header folding has been allowed since HTTP/1.0, so I don't see how you can disallow it now.

@wtarreau
Copy link

I'm not suggesting to disallow it. I'm saying that it has always been said that only headers defined as a comma-delimited list may be folded, and thus by definition those containing commas in the value not being defined as a list cannot be folded. We already know a number of them, thus we can probably state that this number of headers must not be folded, so as to ease processing on the recipient. For example if a cache adds an extra Expires header, I'd rather see it on its own line than folded with the previous one. That doesn't mean it will not happen anymore, it only means we'll be able to more gracefully handle these occasional duplicates when they happen than what we currently do.

For the unmatched quote it could probably be roughly similar. Right now folding such headers results in random behaviour that is better dealt with without folding. By stating that an intermediary should not fold them, we increase the chances that the recipient properly parses its contents and we don't break existing implementations since folding has never been a requirement.

@reschke
Copy link
Contributor

reschke commented Apr 21, 2018

Well.

"A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below)."

So if there are multiple instances of a header field that does not use list syntax, the message is already malformed. So a component that always does folding is on the safe side, spec-wise (garbage in, garbage out).

We really need to distinguish between what we can say about the core protocol, advice for people defining new header fields, and advice for implementations.

@wtarreau
Copy link

I know that some such messages are already malformed, but this is what we have to deal with from time to time (such as the duplicate C-L). My point is that issues raise from the "it never happens" claim, which I'd hope we can turn into "it must never happen, but if it happens here's the safest way to deal with it without risking your software being accused of the trouble since it used to work before you plugged it".

One way to do this could be to encourage limiting the automatic folding. Because the "garbage in garbage out" mode currently is worse, it's more "manageable garbage in, total garbage out" sometimes due to commas or quotes in some header values.

Regarding the distinction between what to say for these 3 separate worlds, I tend to agree with you. But many of us have been hit a few times due to real world match implementations more than the core protocol (e.g. "the content-length value" when you end up with several of them). I tend to think that the core protocol could (and I'm saying could and not should) try to address these real world issues in an elegant and reasonably safe way. For instance, by stating that whenever a header field is mentioned in the spec and assumed to be unique, refer to section X to see how to deal with non-uniqueness of the occurrence or the value (or possibly must have been made unique before processing). It's also critical that we never put a single MUST nor SHOULD there or it will encourage bad players to continue to be bad by relying on recipients to fix their crap. But a couple of MAY could definitely improve interoperability.

@royfielding
Copy link
Member

@mnot
Copy link
Member

mnot commented Feb 2, 2020

Waiting for resolution of #111.

@mnot mnot added the waiting label Feb 2, 2020
@mnot
Copy link
Member

mnot commented Feb 2, 2020

Discussed in Basel; intent is to make sure that Content-Length: 4,4 and those values on two header lines are handled in the same way.

@mnot mnot removed the waiting label Feb 3, 2020
@mnot mnot self-assigned this Mar 19, 2020
mnot added a commit that referenced this issue Mar 20, 2020
... to account for terminology changes, and to clarify that
it doesn't matter whether the values are on different lines.

Fixes #59.
@mnot mnot closed this as completed in #343 Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants