-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The text of this standard appears vulnerable to mismatching other standards #74
Comments
This standard supersedes the RFC. What exactly is the problem you are trying to identify here? |
When we change standards because the browsers do that (as in #17), and change browsers because standards say so, we risk deteriorating backward compatibility without adding usability. I wish the URL standard could include the parsing and the composing algorithms that could be easily proven to derive from a common-sense syntax. As of now, it would be nice to spell out the API for encoding (name value) pairs into the query string and decoding the string back, as in the above prototypes. I guess other components already have their composing and parsing algorithms spelled in the standard. Another pair of pseudo code functions could write and read Unicode or UTF-16 presentations of URLs. This would need more investigation as to which special characters would be safe to reveal from its percent encoding. So far I see no conflict if a subset of the the non-query non-delimiter characters
could be displayed raw in an enhanced-usability form and read back into a byte array, query-characters-only URL (I would avoid displaying the message context delimiter characters raw in the URL, as the presentation URL may be cut and pasted into messages). Also, |
I don't really understand what you're asking for. |
Closing since the issue is unclear. Please let me know if you want to further discuss this. |
RFC 3986 suggests to rely only on the smallest possible set of reserved characters that is necessary to split the URL into 5 components (Section 5.2.1 Pre-parse the Base URI). Assuming that the RFC implied left-to-right parsing, that would mean encoding only the terminator expected by the parser in each component. The query component has the hash mark as its terminator.
The RFC goes as far as to recommend keeping raw as many characters as possible in section 3.4 Query:
On the other hand, the following part of the RFC implies encoding of many characters.
https://www.ietf.org/rfc/rfc3986.txt
:
,@
,/
,?
in (name value) pairs when generating the query component, allowing their use as delimiters in the resulting query string. (I found only a special interpretation of one sub-delim=
as the name=value separator in the RFC. The RFC remains silent about the role of other special characters, even&
, as delimiters of the resulting query string).More to that, Appendix C Delimiting a URI in Context seems to imply that double quotes
22"
, whitespace20SP
, hyphens2D-
and angle brackets3C<
,3E>
need encoding when the URLs are further submerged into a context of a text message directed at a human reader. It would be nice to remain strict about the parsers that seem external to the URL parser and let additional encoders protect against specific external parsers. On the other hand, not every message reader applies a parser to line breaks, so protecting the Appendix characters using the percent encoder for own hyphens seems a reasonable option when splitting the URLs with hyphens on line breaks. The RFC requirement mentioned in (b) above already protects double quotes22"
, whitespace20SP
and angle brackets3C<
,3E>
with the percent-encoding algorithm.So far I see the following algorithms for encoding and decoding (name value) pairs as satisfying the RFC's musts and following its shoulds. I guess this should agree with https://github.com/tkem/uritools. (The RFC did not mention the vestige of isindex HTML tag submitting a request with words separated by the plus characters: the plus character in the query part of the URL decodes to the space character).
The text was updated successfully, but these errors were encountered: