-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Content-Length values that mismatch #59
Comments
https://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.3.3 has:
|
https://bugzilla.mozilla.org/show_bug.cgi?id=1455614 is the case I was concerned about. |
I see. Well, RFC 7230 already requires rejecting it. |
Yeah, I hadn't considered the requirements would span multiple sections. Why does it say a single Content-Length header field having an invalid value and not one or more Content-Length header fields having an invalid value? |
There's some fuzziness about terminology around the differences between:
This could do with some improvement. |
The fuzziness often comes from header folding, because most parsers will only look at one occurrence of the header and will not expect to find commas in it. Those running "atoi()" will parse "21345,1789" as "21345". But I remember we added these stricter requirements a bit late in the document redesign and now think we should really consider that any header field may have experienced undesired folding and contain funny stuff. This could allow us to avoid repeating ourselves along all the text of each sensitive header field. |
In fact I'm thinking that as crazy as it could seem, we should probably define Content-Length as a list. It will encourage implementations in checking all possible values around commas instead of getting trapped. That's what haproxy does by the way. |
whatwg/mimesniff#30 (comment) might be of interest. I was thinking of encouraging eager combining and then parsing for all headers, but @MattMenke2 pushed back on that a bit. An alternative would be to only do that for new headers (as @mnot also advocates in structured headers) and a couple of existing ones for which the difference cannot be observed (such as |
Making special cases is always problematic because there are intermediaries who apply the same rule everywhere without knowing about the header which is processed. I remember that Apache always folds headers except set-cookie. It's permitted because multiple occurrences may only appear if the header is defined as a comma-delimited list. Haproxy is able to extract header values always using the comma as a delimiter, though it can also extract them as full-line value (eg: for date/user-agent etc). Probably we need to refine the parsing algorithm. Maybe it could be said that folding is only permitted when there is no unmatched quotes for example, and that recipients must look for multiple headers occurrences when processing a value even if it's not defined as a list. This would not fix what's already being done in field but could improve the situation. Duplicate headers are something we constantly have to deal with. These ones can come from caching components which accidently add a value to the one already in the cache, and from application frameworks which unconditionally add headers on top of what the application already did. That's why maybe in the end we should declare that any header is a list, except those already containing commas (how many are they?) which must not be folded and should be handled like set-cookie. |
@wtarreau - now that sounds like a breaking change. Header folding has been allowed since HTTP/1.0, so I don't see how you can disallow it now. |
I'm not suggesting to disallow it. I'm saying that it has always been said that only headers defined as a comma-delimited list may be folded, and thus by definition those containing commas in the value not being defined as a list cannot be folded. We already know a number of them, thus we can probably state that this number of headers must not be folded, so as to ease processing on the recipient. For example if a cache adds an extra Expires header, I'd rather see it on its own line than folded with the previous one. That doesn't mean it will not happen anymore, it only means we'll be able to more gracefully handle these occasional duplicates when they happen than what we currently do. For the unmatched quote it could probably be roughly similar. Right now folding such headers results in random behaviour that is better dealt with without folding. By stating that an intermediary should not fold them, we increase the chances that the recipient properly parses its contents and we don't break existing implementations since folding has never been a requirement. |
Well. "A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below)." So if there are multiple instances of a header field that does not use list syntax, the message is already malformed. So a component that always does folding is on the safe side, spec-wise (garbage in, garbage out). We really need to distinguish between what we can say about the core protocol, advice for people defining new header fields, and advice for implementations. |
I know that some such messages are already malformed, but this is what we have to deal with from time to time (such as the duplicate C-L). My point is that issues raise from the "it never happens" claim, which I'd hope we can turn into "it must never happen, but if it happens here's the safest way to deal with it without risking your software being accused of the trouble since it used to work before you plugged it". One way to do this could be to encourage limiting the automatic folding. Because the "garbage in garbage out" mode currently is worse, it's more "manageable garbage in, total garbage out" sometimes due to commas or quotes in some header values. Regarding the distinction between what to say for these 3 separate worlds, I tend to agree with you. But many of us have been hit a few times due to real world match implementations more than the core protocol (e.g. "the content-length value" when you end up with several of them). I tend to think that the core protocol could (and I'm saying could and not should) try to address these real world issues in an elegant and reasonably safe way. For instance, by stating that whenever a header field is mentioned in the spec and assumed to be unique, refer to section X to see how to deal with non-uniqueness of the occurrence or the value (or possibly must have been made unique before processing). It's also critical that we never put a single MUST nor SHOULD there or it will encourage bad players to continue to be bad by relying on recipients to fix their crap. But a couple of MAY could definitely improve interoperability. |
Waiting for resolution of #111. |
Discussed in Basel; intent is to make sure that |
... to account for terminology changes, and to clarify that it doesn't matter whether the values are on different lines. Fixes #59.
The text that stipulates reject or replace at https://tools.ietf.org/html/rfc7230#section-3.3.2 should probably say that if the values differ than browsers should definitely reject. Only Chrome currently does so consistently per web-platform-tests/wpt#10548, but I'm going to try to get the others on board.
The text was updated successfully, but these errors were encountered: