Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid < and > in hosts #458

Closed
TimothyGu opened this issue Dec 11, 2019 · 10 comments · Fixed by #459
Closed

Forbid < and > in hosts #458

TimothyGu opened this issue Dec 11, 2019 · 10 comments · Fixed by #459
Labels
compat Standard is not web compatible or proprietary feature needs standardizing needs tests Moving the issue forward requires someone to write tests topic: parser

Comments

@TimothyGu
Copy link
Member

https://url.spec.whatwg.org/#forbidden-host-code-point currently lists a number of code points as forbidden in hosts. However, some implementations have more:

Considering <> seem to be forbidden in two browsers, we should consider making them forbidden too. The list is testable through new URL('http://</').href.

See also: nodejs/node#30223

@annevk
Copy link
Member

annevk commented Dec 12, 2019

It seems @domenic is on board. Should we have someone from Chrome weigh in on this? @domenic will you update tests as part of updating whatwg-url or will someone else update that?

@achristensen07 given your commit message for adding < and > to WebKit, what are your thoughts on *, ^, |, and "?

@annevk annevk added compat Standard is not web compatible or proprietary feature needs standardizing needs tests Moving the issue forward requires someone to write tests topic: parser labels Dec 12, 2019
@sleevi
Copy link

sleevi commented Dec 12, 2019 via email

@domenic
Copy link
Member

domenic commented Dec 12, 2019

If this is definitely going to happen I can take on updating the tests.

@achristensen07
Copy link
Collaborator

I have three concerns:
0. Are there any real registered domains with '*', '^', '|', or '"' in them? I imagine there are rules somewhere in ICANN preventing this, but it would be good to reference them.

  1. What led to these characters being forbidden in Gecko? Will we want to change this set of forbidden characters again after this?
  2. Are there any URLs with custom schemes with those in their host? This is harder to find out. I hope the compatibility risk is minimal but I don't have a good way to find out except changing it and seeing which things break. This is also a nudge to Chrome and Firefox to implement hosts in URLs with custom schemes according to the spec.

@sleevi
Copy link

sleevi commented Dec 13, 2019 via email

@valenting
Copy link
Collaborator

1. What led to these characters being forbidden in Gecko?  Will we want to change this set of forbidden characters again after this?

Basically we use hostnames to compute "origin attributes" which is basically the origin + some other info we use to enforce security properties, and all of these characters were interfering with how we parse the origin attributes. Since they weren't supposed to appear in hostnames anyway, we decided to restrict them from appearing in the hostname.
https://bugzilla.mozilla.org/show_bug.cgi?id=1337629#c0
https://bugzilla.mozilla.org/show_bug.cgi?id=1548306

2. Are there any URLs with custom schemes with those in their host?  This is harder to find out. I hope the compatibility risk is minimal but I don't have a good way to find out except changing it and seeing which things break.  This is also a nudge to Chrome and Firefox to implement hosts in URLs with custom schemes according to the spec.

I'm making progress in Firefox on that front. Tracking that work in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1603699

@achristensen07
Copy link
Collaborator

achristensen07 commented Dec 13, 2019

I'm not opposed to forbidding ^ given that Chrome and Firefox already do. I'd like to avoid having different character sets for special and non-special schemes until we find evidence that that is needed to mitigate significant compatibility issues, so let's just give it a try and I'll let you know if we find any problems in our third party apps or websites.
With all the characters *<>^|"^ we should definitely either fail to parse or succeed. Percent encoding should not be done in URLs like http://*/ or http://</ and we should add web platform tests to verify behavior alignment.

@domenic
Copy link
Member

domenic commented Dec 13, 2019

You can surround code you want un-formatted with backticks like so: `http://*/`

@annevk
Copy link
Member

annevk commented May 7, 2020

Okay, so for both web++: and https: WebKit fails on < and >. Chrome and Firefox do not fail for web++:. For the remaining case Chrome only fails for ^ and Firefox for <>*|"^.

Given the various comments above it seems reasonable to me to continue with adding <, >, and ^ to the forbidden host code points, but no more, and test all of these code points (including those that fail in Fx but ought to succeed). Do you still want to write tests @domenic?

@ZYSzys are you interested in updating #459 to match?

@ZYSzys
Copy link
Contributor

ZYSzys commented May 7, 2020

Given the various comments above it seems reasonable to me to continue with adding <, >, and ^ to the forbidden host code points, but no more, and test all of these code points (including those that fail in Fx but ought to succeed).

@annevk Updated, having added ^ too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat Standard is not web compatible or proprietary feature needs standardizing needs tests Moving the issue forward requires someone to write tests topic: parser
Development

Successfully merging a pull request may close this issue.

7 participants