Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbidden host code points #214

Closed
annevk opened this issue Jan 23, 2017 · 4 comments
Closed

Forbidden host code points #214

annevk opened this issue Jan 23, 2017 · 4 comments

Comments

@annevk
Copy link
Member

annevk commented Jan 23, 2017

In #185 (comment) @achristensen07 complained about the host code point restrictions I was adding for opaque hosts. I figured I'd explain the rules for code point restrictions in hosts in general once here and then we can either decide to agree or fiddle with the specifics.

Currently we have the following restrictions for non-opaque hosts. I listed the justification for each on the right:

  • U+0000 (generally problematic)
  • U+0009 (stripped when parsing URLs, would create reparsing issues)
  • U+000A (stripped when parsing URLs, would create reparsing issues)
  • U+000D (stripped when parsing URLs, would create reparsing issues)
  • U+0020 (creates copy-and-paste issues, see Consider always escaping U+0020 #125)
  • "#" (would create reparsing issues)
  • "%" (would create reparsing issues due to host percent-decoding)
  • "/" (would create reparsing issues)
  • ":" (would create reparsing issues)
  • "?" (would create reparsing issues)
  • "@" (would create reparsing issues)
  • "[" (would create reparsing issues)
  • "" (would create reparsing issues)
  • "]" (would create reparsing issues)

Now for non-opaque hosts I took this list and removed the code points that were no longer problematic. And those are "%" (opaque hosts have no percent decoding), "[" (no IPv6), "" (no special backslash handling), and "]" (no IPv6).

The reason to be maximally liberal is given in #159 (comment).

@achristensen07 does this help or do you have the same concerns still? And if so, what would you do?

@achristensen07
Copy link
Collaborator

This makes sense. I just didn't see a documented reason for the seemingly-arbitrary list, but now it makes sense.

@annevk
Copy link
Member Author

annevk commented Jan 24, 2017

Thank you, that should allow us to close a number of issues this week.

@achristensen07
Copy link
Collaborator

For simplicity's sake, since this is so similar to the list of invalid domain characters, it might be nice to keep them the same. That would prevent issues if people switch schemes, etc.

@annevk
Copy link
Member Author

annevk commented Jan 24, 2017

Currently the specification disallows scheme switching for this reason (and also because non-special and special parse so differently), but making them the same seems reasonable nonetheless. I'll refactor the PR so that the list of code points is a reference rather than duplicated.

@annevk annevk closed this as completed in 3036255 Jan 24, 2017
hubot pushed a commit to WebKit/WebKit-http that referenced this issue Feb 3, 2017
https://bugs.webkit.org/show_bug.cgi?id=167779

Reviewed by Chris Dumez.

LayoutTests/imported/w3c:

* web-platform-tests/url/a-element-expected.txt:
* web-platform-tests/url/a-element-xhtml-expected.txt:
* web-platform-tests/url/url-constructor-expected.txt:
* web-platform-tests/url/url-setters-expected.txt:

Source/WebCore:

Covered by newly passing web platform tests.

* platform/URLParser.cpp:
(WebCore::isC0Control):
(WebCore::isForbiddenHostCodePoint):
(WebCore::URLParser::parseHostAndPort):
In non-special URL hosts such as customprotocol://strange%host
don't accept characters that are part of the URL grammar and would be forbidden
in a special URL host, like https://not[allowed
This was recently added to the spec in whatwg/url#214



git-svn-id: http://svn.webkit.org/repository/webkit/trunk@211638 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants