-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using RFC3986 encode form-urlencoded #1778
Comments
I believe the only difference is that |
Yes |
Does anyone have a reference to the BNF/grammar for the "key=value" pairs? I wrote this:
but is there an official IETF document (or addendum) that states this? |
@vinniefalco there is no IETF document. There was a draft, but it was abandoned. WHATWG addresses it in their URL "living standard", although I assume like all of their specs it's defined in terms of browser implementation state machines as opposed to ABNF or anything like that. |
Thanks! (I find the WHATWG "spec" exceptionally frustrating) |
It's not clear to me whether we can change an RFC reference in a patch release, so for now I'm putting this in 3.2. |
FWIW, RFC3986 is what JSON Schema's |
@karenetheridge yeah I think the chances that anyone else even realizes the difference, much less implements it, is pretty slim. Arguably, since RFC3986 is formally listed as updating RFC 1738 (the URL encoding that RFC1866 references), it should be automatically understood that 3986 supersedes the outdated requirements. I'm not sure whether we should do anything here- it's really getting into the weeds, and I think we can rely on the IETF's Update/Obsolete linkage. Maybe. I hope? |
I noticed (with the discussion around urlencoding |
@OAI/tsc review request: Is it worth doing anything here? To summarize:
Do we need to specifically note that our reference to RFC 1886 should incorporate RFC 3986 rules rather than RFC 1738 rules, or is that to be expected automatically because RFC 3986 formally obsoletes RFC 1738 already? In terms of practical impact, I need to know for OASComply whether to validate using the RFC 1738 rules or the RFC 3986 rules. |
The WhatWG URL spec requires Note The |
I would be in favor of updating the RFC reference in the 3.2. My meager understanding of this topic is that the RFC switch is not simply a "clarification" so I don't think it could be done in a patch version on earlier versions. |
@ralfhandl Ugh. The previous line in the WHATWG spec is probably the better one to quote here:
However, that is only relevant when running the WHATWG My feeling is that if people want to use WHATWG's serialization rules, that's on them. It's not our responsibility to keep up with their "living standard" or sort out the grammatical (in the ABNF sense) implications of their pseudocode. They have, historically, been aggressively hostile to accommodating anyone else's use cases or concerns, for example refusing to support relative URI-references because there aren't enough "browser-related use cases". As we are an API specification, I feel comfortable ignoring people who have made it clear that they have no interest in supporting APIs as used by non-browsers. |
@mikekistler if it's really a change, I agree with you. My question is, "is this even a change?" I can't find a clear explanation from the IETF as to whether an RFC that formally obsoletes another RFC automatically takes effect, or whether it requires an updated citation. We don't even cite RFC 1738 directly, so there's no citation to replace. We do normatively cite RFC 3986, but the way we get to 1738 is:
All the OAS citations are many years after 3986, so it was already clear that those rules were doubly-obsolete when we cited 1866. For that matter, it's not entirely clear to me that RFC 1866 really requires percent-encoding those extra characters. §8.2.1 The form-urlencoded Media Type states:
Oddly enough, this mentions percent-encoding reserved characters, rather than unsafe characters. There's no errata for this RFC, so this is the text in question. §2.2 URL Character Encoding Issues includes this text in a paragraph with the sub-heading "Unsafe":
In that same section, under the sub-heading "Reserved":
The BNF in §5 also does not place the characters in question in the safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
punctuation = "<" | ">" | "#" | "%" | <">
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
escape = "%" hex hex
unreserved = alpha | digit | safe | extra
uchar = unreserved | escape
xchar = unreserved | reserved | escape The term "national" is not used anywhere else in the RFC, nor do any other BNF rules depend on it. Which means the But RFC 1866 only mentions "reserved" characters. Not "unsafe" characters, or "characters outside of the legal set." So... my feeling, after spending way too much time reading specs for this, is that anyone who is going to be legalistic enough to complain about RFC 1738 being cited by RFC 1866 can be countered by the fact that the most direct reading of RFC 1866 doesn't require encoding the |
I'm now working on a PR describing the URL-encoding processes for query strings and request bodies in detail, and one way or another this will get addressed. So I'm taking the |
PR merged for 3.0.4 and ported to 3.1.1 via PR #3921! |
3.0.2 spec section Support for x-www-form-urlencoded Request Bodies:
And, RFC1866 section 8.2.1 The form-urlencoded Media Type:
Indicates that using encoding specified in RFC1738 and a special treatment on space character
While according to HTML 5.2 section 4.10.21.6 URL-encoded form data:
Which says:
So, please consider using RFC3986 instead of RFC1738, RFC1738 is too old(December 1994).
The text was updated successfully, but these errors were encountered: