-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Percent-decode more stuff? #87
Comments
I think it's a special case in most browsers from when I tested this last time around. |
@achristensen07 what are your ideas here? I believe @valenting wanted to do more decoding in Gecko as well: https://bugzilla.mozilla.org/show_bug.cgi?id=1197123. It's basically a choice between this document, Safari, and Firefox and Chrome and Edge. Arguably what Chrome and Edge do is somewhat nicer since there is more canonicalization, but it could lead to subtle issues too. |
I think I'm opposed to this change. alert(new URL("http://host/p%61th")); should show the %61, not 'a'. There are a lot of URL implementations based on old RFCs, and we should try to match their behavior as closely as possible. People sometimes want '%' in the path and often in the query, and we don't want people to have to put things like "%2525" in their URL for some user agents and not others. |
@achristensen07 note that the old RFCs encourage normalizing these in the second paragraph of https://tools.ietf.org/html/rfc3986#section-2.3. |
To be clear, not %25, but a lot of other ranges, including %61. |
Sure enough, Chrome and Edge percent decode more things in the path, but not the query. This can bee seen with alert(new URL("http://host/%45%25?%45%25")); We definitely shouldn't touch the query and all browsers seem to agree on this. |
Paging @foolip to see if we can get Chrome's perspective on this. |
@valenting I suggest that before changing Gecko here we try to bring this thread to a conclusion first. |
Is it common on the internet to see percent-encoded dots or other characters in the path? Why was section 2.3 added to rfc3986? Safari has not percent-decoded anything in the path, and I'm not aware of any compatibility issues, but it's possible I've missed something. A solution to this problem would be to just not percent-decode anything in the path, including %2e and %2E |
Question: to which of the following should "http://host/%%36%31" be equivalent?
|
Safari and Firefox both leave "http://host/%%36%31" untouched when canonicalizing. They also agree on the behavior of URLs like "http://host/%2e" and I suggest we change the spec to match this. What is the rationale behind Chrome's percent-encoding only the first percent, and percent decoding some characters but not others? It leads to some confusing behavior. |
We actually tried to change it in Firefox so that we decode %2e but it got backed out due to an implementation bug. :) |
I still think we should change the spec to not percent-decode anything in the path. Percent-decoding %2E makes it so there is only one thing that cannot be in the path. Percent-decoding more leads to strange reparsing behavior. Simpler is better, unless there's a good reason we need this. |
https://bugs.webkit.org/show_bug.cgi?id=163929 Reviewed by Alexey Proskuryakov. LayoutTests/imported/w3c: * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: Source/WebCore: Covered by updated API tests, which show that URLParser now matches URL::parse in these cases. Also covered by newly failing web platform tests, which were failing before URLParser was enabled. If whatwg/url#87 is resolved we can change behavior to match. * platform/URLParser.cpp: (WebCore::URLParser::isSingleDotPathSegment): (WebCore::URLParser::isDoubleDotPathSegment): (WebCore::URLParser::consumeSingleDotPathSegment): (WebCore::URLParser::consumeDoubleDotPathSegment): (WebCore::URLParser::parse): (WebCore::URLParser::isPercentEncodedDot): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): LayoutTests: * fast/url/path-expected.txt: * fast/url/standard-url-expected.txt: * fetch/fetch-url-serialization-expected.txt: git-svn-id: http://svn.webkit.org/repository/webkit/trunk@207795 268f45cc-cd09-0410-ab3c-d52691b4dbfc
https://bugs.webkit.org/show_bug.cgi?id=163929 Reviewed by Alexey Proskuryakov. LayoutTests/imported/w3c: * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: Source/WebCore: Covered by updated API tests, which show that URLParser now matches URL::parse in these cases. Also covered by newly failing web platform tests, which were failing before URLParser was enabled. If whatwg/url#87 is resolved we can change behavior to match. * platform/URLParser.cpp: (WebCore::URLParser::isSingleDotPathSegment): (WebCore::URLParser::isDoubleDotPathSegment): (WebCore::URLParser::consumeSingleDotPathSegment): (WebCore::URLParser::consumeDoubleDotPathSegment): (WebCore::URLParser::parse): (WebCore::URLParser::isPercentEncodedDot): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): LayoutTests: * fast/url/path-expected.txt: * fast/url/standard-url-expected.txt: git-svn-id: http://svn.webkit.org/repository/webkit/trunk@207805 268f45cc-cd09-0410-ab3c-d52691b4dbfc
I don't understand. When I try |
And not normalizing that seems dangerous for servers, since you might be able to trick the server to go outside the root of the site. |
The bug was about %2e%2e%2e - or %2eX we don't normalize that. |
What if we kept the definition of double-dot path segments to protect servers and maybe single-dot path segments, but removed this: |
I don't know Blink's KURL or what lies beneath well enough to judge what that change might mean. @sleevi? Tests that exercise the difference might be useful, if they'd go from pass to fail or vice versa in Chrome it should be easier to understand where to look. https://github.com/w3c/web-platform-tests/tree/master/url is the right place I guess? |
@foolip yes. |
And therefore equivalent to "http://host/%25%2536%2531", right? |
No, none of those percents are encoded or decoded. Also, I don't think it is adequate to protect servers by relying on the client getting rid of %2e%2e in the path. A server should have to protect itself from requests for ../../../etc. even from malicious clients that do not obey the URL spec. I don't see how parsing %2e%2e as .. protects servers. |
@achristensen07 you cannot craft malicious requests like that from a browser (and therefore with the user's credentials). |
@annevk Sorry for the delay. The rationale for Chrome's current behaviour is, I believe, captured here in this comment block - https://cs.chromium.org/chromium/src/url/url_canon_path.cc?rcl=0&l=167 The reason for that change is explained in https://chromium.googlesource.com/chromium/src/+/6982036dc15ba85cdbf7339265f030ff7582ff3b - and the related crash. So there are definitely implications to changing Chrome that make it harder to be an easy flip, but it's not to say it's unreasonable to explore changing. |
This project claims an explicit goal of replacing RFC 3986 and RFC 3987 by aligning with contemporary implementations, presumably affecting both clients and servers as those documents do. And since percent-decoding is already specified, "http://host/%36%31" is equivalent to "http://host/61". The question is what equivalence class (if any) "http://host/%%36%31" falls into (e.g., Chrome treats it as equivalent to "http://host/%2561", one option of at least four). |
@gibson042 percent-decoding is not specified by the URL Standard (at least not to a significant extent). Replacing those RFCs does not mean that we'll prescribe what they prescribe. |
It seems we avoid most issues here by simply not percent-decoding (other than host, where a one-time decode pass followed by IDNA is fine as it will return failure if anything fails). My conclusion here is to embrace the suggestion in #87 (comment) to only work with double-dot path segments and single-dot path segments and not decode %2e generally (or anything else). We need to make sure that is covered by the testsuite. Gecko will need to WONTFIX https://bugzilla.mozilla.org/show_bug.cgi?id=1197123 (action item for @valenting I think). Chrome and Edge will have to change their behavior. I think in the end that will turn out to be a drastic code simplification for them since they no longer have to worry about cases such as |
@sleevi I'm not sure I follow. If there's no normalization during parsing, there's no normalization in the model, and there's no normalization exposed in the API. Where do you think normalization would happen? |
@annevk I'm suggesting that normalization is an intrinsic necessity of Chrome, at present, and as such, ends up reflected in the API we surface. While it's possible to suggest that's changable, it's not currently a high priority - that is, the normalization that occurs is an intrinsic dependency of many of Chromium's URL-handling subsystems. If we change the parsing phases to remove normalization, this creates more issues than it solves. Put differently, if normalization is removed, it's unlikely Chromium will be able to align on this point in any reasonable timeframe. |
@sleevi could you perhaps explain why it's an intrinsic necessity in some detail? And I guess also, what exactly the normalization entails? |
@annevk It's important to understand, as mentioned before, that our URL code is shared across a variety of Google client platforms, and, even within Chrome, shared through a variety of layers - IPC, networking, UI (and, of that, security UI as a distinct subset), and of course, the web-visible exposure provided by the URL API. While the URL API itself has no intrinsic needs that we're immediately aware of (short of checking compat issues), what is essential for a variety of checks and comparisons is that we have a consistent set of behaviours and presentations for URLs, and that inherently necessitates normalization, such that any two URLs, if they represent the same underlying URL, are treated equivalently. This behaviour has been extensively integrated into our code, because it was one of the motivations for the library itself. Again, that's not to say we can't change, but it's certainly not a priority, given how significant an undertaking it would be. I think @gibson042's concerns are reasonable, and I suspect that many web developers would be surprised. I have trouble accepting @achristensen07's arguments against it. To put the counter-argument forward: Why do you feel that normalization would be bad? |
I'm not necessarily opposed, if we can find a reasonable processing model that everyone can agree on and doesn't lead to subtle issues with input such as And note that the URL API is not the only bit that's web-visible. HTTP requests, |
I think for now I'm proceeding with the plan on aligning the standard with what Firefox and Safari implement. We don't have a credible alternative proposal and the current document is aligned with nobody. And while RFC3986 does allow for additional normalization, it doesn't prescribe it, so I think either way we're roughly aligned with that. We could mention explicitly that servers are allowed to perform additional normalization and maybe should do that at some point. Or even expose an API to clients. |
This reverts commit 47d2089. See whatwg/url#87 for discussion.
This basically reverts bee5ad8 as it did not end up shipping in Firefox and the tentative conclusion is that percent-decoding in URLs while parsing creates more trouble than it’s worth. We’ll still handle %2e when it’s part of a single-dot or double-dot path segment. Tests: web-platform-tests/wpt#4104. Closes #87.
This reverts commit 47d2089 and also updates setter tests that were added later. See whatwg/url#87 for discussion and whatwg/url#156 for the URL Standard PR.
This basically reverts bee5ad8 as it did not end up shipping in Firefox and the tentative conclusion is that percent-decoding in URLs while parsing creates more trouble than it’s worth. We’ll still handle %2e when it’s part of a single-dot or double-dot path segment. Tests: web-platform-tests/wpt#4104. Closes #87.
Per a recent change to the URL spec, arbitrary %2e sequences in URL paths that are not single or double dot segments are not to be decoded. Refs: whatwg/url#87 Refs: whatwg/url#156 Refs: web-platform-tests/wpt@d93247d Fixes: nodejs#10598
Per a recent change to the URL spec, arbitrary %2e sequences in URL paths that are not single or double dot segments are not to be decoded. Refs: whatwg/url#87 Refs: whatwg/url#156 Refs: web-platform-tests/wpt@d93247d Fixes: #10598 PR-URL: #10602 Reviewed-By: Michal Zasso <targos@protonmail.com> Reviewed-By: Italo A. Casas <me@italoacasas.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Per a recent change to the URL spec, arbitrary %2e sequences in URL paths that are not single or double dot segments are not to be decoded. Refs: whatwg/url#87 Refs: whatwg/url#156 Refs: web-platform-tests/wpt@d93247d Fixes: nodejs#10598 PR-URL: nodejs#10602 Reviewed-By: Michal Zasso <targos@protonmail.com> Reviewed-By: Italo A. Casas <me@italoacasas.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Per a recent change to the URL spec, arbitrary %2e sequences in URL paths that are not single or double dot segments are not to be decoded. Refs: whatwg/url#87 Refs: whatwg/url#156 Refs: web-platform-tests/wpt@d93247d Fixes: nodejs#10598 PR-URL: nodejs#10602 Reviewed-By: Michal Zasso <targos@protonmail.com> Reviewed-By: Italo A. Casas <me@italoacasas.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Per a recent change to the URL spec, arbitrary %2e sequences in URL paths that are not single or double dot segments are not to be decoded. Refs: whatwg/url#87 Refs: whatwg/url#156 Refs: web-platform-tests/wpt@d93247d Fixes: nodejs#10598 PR-URL: nodejs#10602 Reviewed-By: Michal Zasso <targos@protonmail.com> Reviewed-By: Italo A. Casas <me@italoacasas.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Per a recent change to the URL spec, arbitrary %2e sequences in URL paths that are not single or double dot segments are not to be decoded. Refs: whatwg/url#87 Refs: whatwg/url#156 Refs: web-platform-tests/wpt@d93247d Fixes: nodejs#10598 PR-URL: nodejs#10602 Reviewed-By: Michal Zasso <targos@protonmail.com> Reviewed-By: Italo A. Casas <me@italoacasas.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
https://bugs.webkit.org/show_bug.cgi?id=163929 Reviewed by Alexey Proskuryakov. LayoutTests/imported/w3c: * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: Source/WebCore: Covered by updated API tests, which show that URLParser now matches URL::parse in these cases. Also covered by newly failing web platform tests, which were failing before URLParser was enabled. If whatwg/url#87 is resolved we can change behavior to match. * platform/URLParser.cpp: (WebCore::URLParser::isSingleDotPathSegment): (WebCore::URLParser::isDoubleDotPathSegment): (WebCore::URLParser::consumeSingleDotPathSegment): (WebCore::URLParser::consumeDoubleDotPathSegment): (WebCore::URLParser::parse): (WebCore::URLParser::isPercentEncodedDot): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): LayoutTests: * fast/url/path-expected.txt: * fast/url/standard-url-expected.txt: * fetch/fetch-url-serialization-expected.txt: Canonical link: https://commits.webkit.org/181656@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@207795 268f45cc-cd09-0410-ab3c-d52691b4dbfc
https://bugs.webkit.org/show_bug.cgi?id=163929 Reviewed by Alexey Proskuryakov. LayoutTests/imported/w3c: * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: Source/WebCore: Covered by updated API tests, which show that URLParser now matches URL::parse in these cases. Also covered by newly failing web platform tests, which were failing before URLParser was enabled. If whatwg/url#87 is resolved we can change behavior to match. * platform/URLParser.cpp: (WebCore::URLParser::isSingleDotPathSegment): (WebCore::URLParser::isDoubleDotPathSegment): (WebCore::URLParser::consumeSingleDotPathSegment): (WebCore::URLParser::consumeDoubleDotPathSegment): (WebCore::URLParser::parse): (WebCore::URLParser::isPercentEncodedDot): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): LayoutTests: * fast/url/path-expected.txt: * fast/url/standard-url-expected.txt: Canonical link: https://commits.webkit.org/181662@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@207805 268f45cc-cd09-0410-ab3c-d52691b4dbfc
In #77, I was surprised that
.
was such a special case. It is indeed in Firefox 44, but in Chromium 48 other characters are also percent-decoded:-
,_
,~
, and ASCII alpha-numerics. (I don’t have other browsers to test right now.)http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=3878
It may be worth testing other URL components too.
The text was updated successfully, but these errors were encountered: