Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Punycode domains are not properly validated #239

Closed
karwa opened this issue May 2, 2022 · 6 comments
Closed

Punycode domains are not properly validated #239

karwa opened this issue May 2, 2022 · 6 comments

Comments

@karwa
Copy link
Contributor

karwa commented May 2, 2022

Start with the input:

http://xn--ls8h/

It works, as it should, and the result is the same. It's the poop emoji, in case you were wondering. It's in the WPT tests.

Anyway, we can just start adding text to the end of this. For example:

(Input) -> (JSDOM output)

"http://xn--ls8h=/" -> "http://xn--js8hea/"
"http://xn--ls8h==/" -> "http://xn--hs8hdh/"
"http://xn--ls8h===/" -> "http://xn--gs8hcfj/"

"http://xn--ls8h===helloworldhowareyoutoday/" -> "http://xn--gs8hcaceekclworldhowareyoutoday/"
                   ^^^^^^^^^^^^^^^^^^^^^^^^                              ^^^^^^^^^^^^^^^^^^^

All of these inputs are invalid. Safari refuses to parse them, and my own IDNA implementation agrees. The tail part of the Punycode domain (after the xn--) should only consist of ASCII alphanumerics; having "=" signs in there (in the input) is clearly invalid.

@domenic
Copy link
Member

domenic commented May 3, 2022

Any help diagnosing where in whatwg-url (or perhaps https://github.com/jsdom/tr46/) this is breaking would be much appreciated. As usual, the goal is to match specs as 1:1 as possible so this kind of diagnostic would probably look like: "here is the spec, it says X. But your code is doing Y."

@annevk
Copy link
Contributor

annevk commented Jan 16, 2023

I also noticed https://ي./ failing, but succeeding in browsers. I would write a WPT, but I think that currently gives UB due to UTS46 handling of CheckBidi so it would require some kind of tentative test I suppose.

Edit: web-platform-tests/wpt#37993 now tracks this. Will likely not merge until the specification issue is done.

@domenic
Copy link
Member

domenic commented Jan 23, 2023

The cases in the OP seem to get fixed by mathiasbynens/punycode.js#124 .

https://ي./ still fails.

A few hundred other cases from web-platform-tests/wpt#38080 still fail. Will comment over there.

domenic added a commit to jsdom/tr46 that referenced this issue Jan 23, 2023
@TimothyGu
Copy link
Member

The issue with https://ي./ is that the Bidi Rule in RFC 5893 seems to just assume that any label is nonempty (see e.g., rule 1). The bug can thus be reproduced with any domain name with both a RTL character and an empty label.

Ideally, UTS 46 should clarify this point, excluding empty labels from Bidi Rule consideration. We should add this item to whatwg/url#744 if there's still time.

@annevk
Copy link
Contributor

annevk commented Jan 24, 2023

That's the second comment there, right? Already submitted.

@TimothyGu
Copy link
Member

Ah, right.

@domenic domenic closed this as completed in ed78a8f Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants