-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forbid < and > in hosts #458
Comments
It seems @domenic is on board. Should we have someone from Chrome weigh in on this? @domenic will you update tests as part of updating whatwg-url or will someone else update that? @achristensen07 given your commit message for adding |
Despite all of Chrome’s divergences, I think this should be an easy one to
fix with limited compatibility risk, since it should largely just be
updating
https://cs.chromium.org/chromium/src/url/url_canon_host.cc?rcl=f3a71afd920572d61a2b3d4aa4aaa8f007e3f729
That said, I’m not sure if changes to URL parsing in Chromium still require
large-scale analysis and syncing parsing across non-Chrome Google products
:/
|
If this is definitely going to happen I can take on updating the tests. |
I have three concerns:
|
On Thu, Dec 12, 2019 at 4:10 PM achristensen07 ***@***.***> wrote:
I have three concerns:
0. Are there any real registered domains with '*', '^', '|', or '"' in
them? I imagine there are rules somewhere in ICANN preventing this, but it
would be good to reference them.
RFC1034 sets out the LDH rule for preferred name syntax, and the ICANN
Registry Agreement (specifically, Spec 6 - Interoperability and Continuity)
restricts registerable domains to that.
Of course, the world knows that beyond that, madness lies - because DNS
wire-form is 8-bit, it can have any form, and even though A/AAAA are
“supposed” to follow preferred name syntax (as host records), buggy servers
combined with generic client libraries (that support other non-DNS
resolution paths) can let anything through. The most obvious case is
underscores.
So these URLs would appear either in private networks, non-DNS host
schemes, or as subdomains of registered names taking advantage of lax
client behavior. While you can’t issue TLS certificates to these names
directly, you can sneak by with wildcards, sadly.
1. What led to these characters being forbidden in Gecko? Will we want
to change this set of forbidden characters again after this?
2. Are there any URLs with custom schemes with those in their host?
This is harder to find out. I hope the compatibility risk is minimal but I
don't have a good way to find out except changing it and seeing which
things break.
Yeah, this is the analysis I mentioned we’d have to do for Chrome. URL
parsing changes are generally accompanied by analyzing corpora like the
entire Google search index to see what compatibility risk might be had, and
that’s a Lot Of Work compared to changing a few characters in a lookup
table to zero. 😕
I think it’s worth doing, and I think it’s worth aligning on.
1. This is also a nudge to Chrome and Firefox to implement hosts in
URLs with custom schemes according to the spec.
Yes. It’s well deserved and the biggest issue with our URL parsing 😔
|
Basically we use hostnames to compute "origin attributes" which is basically the origin + some other info we use to enforce security properties, and all of these characters were interfering with how we parse the origin attributes. Since they weren't supposed to appear in hostnames anyway, we decided to restrict them from appearing in the hostname.
I'm making progress in Firefox on that front. Tracking that work in: |
I'm not opposed to forbidding ^ given that Chrome and Firefox already do. I'd like to avoid having different character sets for special and non-special schemes until we find evidence that that is needed to mitigate significant compatibility issues, so let's just give it a try and I'll let you know if we find any problems in our third party apps or websites. |
You can surround code you want un-formatted with backticks like so: |
Okay, so for both Given the various comments above it seems reasonable to me to continue with adding |
@annevk Updated, having added |
Tests: web-platform-tests/wpt#23572. Closes #458.
https://url.spec.whatwg.org/#forbidden-host-code-point currently lists a number of code points as forbidden in hosts. However, some implementations have more:
*<>^|"
(^
was recently added)<>
(added in 2018)Considering
<>
seem to be forbidden in two browsers, we should consider making them forbidden too. The list is testable throughnew URL('http://</').href
.See also: nodejs/node#30223
The text was updated successfully, but these errors were encountered: