Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idna v2 #250

Merged
merged 5 commits into from
Mar 8, 2023
Merged

Idna v2 #250

merged 5 commits into from
Mar 8, 2023

Conversation

domenic
Copy link
Member

@domenic domenic commented Jan 23, 2023

This does not currently work; /cc @annevk @karwa to see the failures. I suspect they're more bugs in the tr46 or punycode JS modules, but it's hard to be sure... tr46 is supposedly passing all the IdnaTestV2.txt tests, but maybe check out our test harness at https://github.com/jsdom/tr46/blob/master/test/unicode.js to see if something's going wrong in either of the two test harnesses?

@annevk
Copy link
Contributor

annevk commented Jan 23, 2023

You need to percent-encode the input (I pushed a change for this to my web-platform-tests PR). That will reduce errors due to ? being used. That will still leave some errors however that I think will be hard to filter out due to the source data not having annotations as detailed as we'd like. I could probably improve upon it some more though. Perhaps with more effort and some minimal special casing we could use all tests as-is and with a future update remove some of the special casing.

@domenic
Copy link
Member Author

domenic commented Jan 24, 2023

Now most (all?) of the failures seem to come from empty labels, e.g. xn--4-0bd15808a. or .xn--ye6h or xn--09e4694e..xn--ye6h. Are these all the BiDi case you mentioned? I can't find any spec text that prohibits empty labels, but maybe I'm not looking in the right place?

@annevk
Copy link
Contributor

annevk commented Jan 24, 2023

It's not prohibited, but UTS46 runs https://www.rfc-editor.org/rfc/rfc5893#section-2 on each label and the first subrule there assumes the label to be non-empty. From @TimothyGu's comment it sounds like the JS Punycode library runs into that.

@domenic
Copy link
Member Author

domenic commented Jan 25, 2023

So is the intent of merging web-platform-tests/wpt#38080 to make it a de-facto requirement to prohibit such empty labels, i.e., any implementation which wants to pass the WPT suite must add some prohibition on them?

Here's our implementation, FWIW: https://github.com/jsdom/tr46/blob/13d02e630ad6da5ab974f4adfc7c4afb6b44c619/index.js#L132

@annevk
Copy link
Contributor

annevk commented Jan 25, 2023

No, xn--4-0bd15808a. parses in WebKit and WebKit passes all tests. I suspect what your code is tripping on is the fundamental bug in UTS46 around empty labels. I suppose we could try to remove those tests as well, but those don't appear to be a problem for browsers.

@domenic
Copy link
Member Author

domenic commented Jan 25, 2023

Hmm OK. So do we have feedback lined up for this "fundamental bug"? And, are we just saying that the correct fix for that bug, is to allow such empty labels?

@annevk
Copy link
Contributor

annevk commented Jan 25, 2023

See the second item in whatwg/url#744.

Because we don't pass VerifyDnsLength browsers allow empty labels. It might be worth revisiting that, but that's the status quo. E.g., https://./ and https://../ are confusingly correct. (I was very surprised by this a couple days ago too, but since browsers don't exclusively talk over DNS networks it might be reasonable. Not entirely convinced though.)

domenic added a commit to jsdom/tr46 that referenced this pull request Mar 4, 2023
As discovered in jsdom/whatwg-url#250, the library currently does not handle empty labels in the same way that browsers seem to do. Although the standard is unclear, we should align to browser handling.

To test this, we pull in the new IdnaTestV2.json file from WPT. This is somewhat redundant with our existing IdnaTestV2.txt, but it also represents a significant curation effort to ensure we only have URL-applicable tests, so we should make use of that.
@domenic
Copy link
Member Author

domenic commented Mar 4, 2023

After jsdom/tr46#39, there is one remaining failure case. I believe it is caused by mathiasbynens/punycode.js#129.

domenic added a commit to jsdom/tr46 that referenced this pull request Mar 8, 2023
As discovered in jsdom/whatwg-url#250, the library currently does not handle empty labels in the same way that browsers seem to do. Although the standard is unclear, we should align to browser handling.

To test this, we pull in the new IdnaTestV2.json file from WPT. This is somewhat redundant with our existing IdnaTestV2.txt, but it also represents a significant curation effort to ensure we only have URL-applicable tests, so we should make use of that.
@domenic domenic marked this pull request as ready for review March 8, 2023 07:02
@domenic domenic merged commit ed78a8f into master Mar 8, 2023
@domenic domenic deleted the idna-v2 branch March 8, 2023 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants