Accept A-labels #31

jsha · 2016-10-24T18:33:50Z

In letsencrypt/boulder#2278 we have an issue where TLDs that are IDNs are not recognized by Boulder as ending in a public suffix. Boulder currently calls publicsuffix.DefaultList.Find with the A-label form of the domain.

We have a few options:

Boulder could convert to U-label for querying publicsuffix
publicsuffix-go could switch to looking up by A-label by default
publicsuffix-go could offer a different set of methods for looking up by A-label.

I think (2) is my ideal solution. What do you think?

The text was updated successfully, but these errors were encountered:

pfigel · 2016-10-24T19:57:56Z

FWIW, Go's x/net/publicsuffix accepts both U- and A-labels, but does not work accurately for suffixes with multiple U-labels:

EffectiveTLDPlusOne("foo.example.xn--o1ach.xn--90a3ac") == "example.xn--o1ach.xn--90a3ac"
EffectiveTLDPlusOne("foo.example.упр.срб") == "упр.срб"

I don't think that's quite intentional, probably just a side-effect of how it's been implemented (and should probably be made more consistent, or at least documented).

Anyway, if compatibility with x/net/publicsuffix is a goal, the easiest way to replicate this behaviour would probably be to run idna.ToASCII when generating the rules (which is pretty much what x/net/publicsuffix does).

weppos · 2016-10-24T20:38:02Z

Thanks for the ticket @jsha @PatF

Historically, the PSL never included Punycode entries. Instead, only Unicode values are stored (althought I frankly don't recall by heard the reason of the decision, assuming there was one).

So far, I've refrained from adding U/A conversion in most of my libraries, primarily because of the extra-overhead (and lack of support for good IDNA libraries).

However, Go provides a pretty good IDNA library and I may reconsider providing native support for both A/U directly in the library.

If I understood correctly, Let's Encrypt is currently storing and handling names as A labels. Is it correct?

@jsha I'm not thrilled by 2), mostly because it's exactly the opposite behavior of what the PSL is doing. 3) is a potentially solution, although I see a value in (potentially) not having to determine whether the input is A or U (and leave it up to the library). Of course, assuming the overhead is reasonable.

On other words, I'm inclined to consider having:

a default version that works with U as today
a potential helper methods that performs the conversion and/or
a single convenient method (in addition to Find) that will automagically determine if A/U and deal with that

What do you think?

weppos · 2016-10-24T20:40:39Z

Anyway, if compatibility with x/net/publicsuffix is a goal, the easiest way to replicate this behaviour would probably be to run idna.ToASCII when generating the rules (which is pretty much what x/net/publicsuffix does).

It is, as long as it's not a bug. That seems more a bug than a feature to me. I have to investigate it, but if that's the case I will likely not support the buggy version.

In fact, the x/net/publicsuffix comes essentially as an handy transition mechanism (and I bet most of the tools transitioning are not fully supporting IDN).

jsha · 2016-10-25T16:50:19Z

Historically, the PSL never included Punycode entries. Instead, only Unicode values are stored (althought I frankly don't recall by heard the reason of the decision, assuming there was one).

I was aware of this, but thinking about it some more it's surprising to me. The IDN RFCs indicate that U-labels are intended for presentation, while A-labels should be used for under-the-hood processing. To me, this feels like a historic accident. But perhaps it was for a good reason!

If I understood correctly, Let's Encrypt is currently storing and handling names as A labels. Is it correct?

Correct.

I'm inclined to consider having:
a default version that works with U as today
a potential helper methods that performs the conversion and/or
a single convenient method (in addition to Find) that will automagically determine if A/U and deal with that

I'm in favor of (1) and (2). (3) seems unnecessarily risky. Callers should know whether they are handling A or U, and choose the appropriate function.

weppos · 2016-10-25T17:12:18Z

I was aware of this, but thinking about it some more it's surprising to me. The IDN RFCs indicate that U-labels are intended for presentation, while A-labels should be used for under-the-hood processing. To me, this feels like a historic accident. But perhaps it was for a good reason!

That's a reasonable feedback, and in fact I must say that we do the same at DNSimple and in pretty much most of the environments where I worked with names.

In the light of that, switching to A name by default may be logical, although the list is Unicode. It's also true that the internal list representation is irrelevant, as I pre-process it in any case.

I need to take a couple of days to experiment a little bit on which internal representation may be more appropriate. I'd also have a quick check with @sleevi and @gerv (which have an extensive experience on how Chromium/Firefox browsers handles the A/U transformation) to see if they have some specific hint (in addition to a possible explanation of why we store Unicode values).

@jsha I see the issue is currently causing you a bug in the rate-limiting. Is it if I take a few days to research the issue, or do you need a super quick fix? In case, you can merge letsencrypt/boulder#2278 and I'll happily provide a patch once publicsuffix-go is updated. Does it work for you?

jsha · 2016-10-25T17:40:17Z

I'm willing to a wait a few days while you research the issue. The rate limiting issue makes things too strict rather than too lax, and I haven't heard from people who are having problems with it. Thanks!

gerv · 2016-10-26T14:43:42Z

Why we chose U-label as the default (the A-label is in a comment) is lost in the mists of history, but perhaps it was because we thought that was the "proper" version of the name, and the A-label is an implementation detail. Anyway, as there are lossless conversions between the two, it doesn't seem like a big deal.

The two namespaces also don't overlap - it's not permitted to have a U-label of any sort, even an ASCII-only one, with hyphens in the third and fourth position. So it would be entirely reasonable (I'd say) for a smart API to take either, work out which, do the lookup, and return the results in the form submitted.

But if you don't want to do that, any of these other fixes are fine. But I don't think we'll be changing the PSL.

Gerv

sleevi · 2016-10-26T15:46:59Z

On Wednesday, October 26, 2016, Gervase Markham notifications@github.com
wrote:

Why we chose U-label as the default (the A-label is in a comment) is lost
in the mists of history, but perhaps it was because we thought that was the
"proper" version of the name, and the A-label is an implementation detail.
Anyway, as there are lossless conversions between the two, it doesn't seem
like a big deal.

IDNA2003 vs IDNA2008 was the concern at the time, and the fact that a given
name can, in theory, have two separate representations. At the time,
Mozilla was trying to keep its flexibility - at least based on past code I
can tell from around that time.

I haven't heard any substantial movement to IDNA2008 for quite some time.
Perhaps it's not happening? Or did it already happen and I missed it?

The two namespaces also don't overlap - it's not permitted to have a
U-label of any sort, even an ASCII-only one, with hyphens in the third and
fourth position. So it would be entirely reasonable (I'd say) for a smart
API to take either, work out which, do the lookup, and return the results
in the form submitted.

But if you don't want to do that, any of these other fixes are fine. But I
don't think we'll be changing the PSL.

Gerv

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#31 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABayJ6W9p1v4YdBIJYmYUJ2pd-u4DIzBks5q32cfgaJpZM4KfIXf
.

gerv · 2016-10-26T15:51:59Z

We switched 8 months ago, in Firefox 45:
https://bugzilla.mozilla.org/show_bug.cgi?id=1218179

Feel free to join us :-)
https://bugs.chromium.org/p/chromium/issues/detail?id=505262

jsha · 2016-10-26T18:36:02Z

FWIW, the current ACME spec requires IDNA2008: https://ietf-wg-acme.github.io/acme/#rfc.section.6.1.4.

sleevi · 2016-10-26T18:55:06Z

@gerv I'll poke jshin on that, but I think it would at least highlight a 'possibility' where the U-label form is valuable - to allow consistent decomposition to the A-label in both IDNA2003 and IDNA2008 apps. Of course, I wish we'd just go with IDNA2008.

Perhaps an interim solution is convert to A-label across the board (which will certainly break Chrome's PSL implementation, and presumably Firefox, but we can fix both of those easily) in IDNA2008, and then perhaps add a consistency check as part of the PSL's tests to make sure the IDNA2003 and 2008 representations are consistent, until the PSL full throws IDNA2003 under the bus?

gerv · 2016-10-27T08:17:29Z

Converting to A-label would presumably break everyone's implementation? What would be the advantages of doing that?

sleevi · 2016-10-27T16:40:37Z

@gerv - As jsha@ mentioned, it would allow applications to deal only with A-Labels when dealing with the PSL, without having to worry about the IDNA transformation. I certainly know that some of our mobile applications that don't support U-Labels don't want to carry the weight of an IDNA translation, to keep download size small.

weppos · 2016-10-28T10:05:06Z

@sleevi @gerv I agree that changing the list now may potentially break implementation. In fact, we can even consider providing 2 alternate sources if we decide to go with the transition, and automatically convert one into another.

It's also worth to mention that consumers can perform the translation on their own, pre-processing the file.

In fact, my initial research goal is to understand which representation is better for the specific case of this library. If it happens to be A-labels, I'll be happy to add an extra step in the pre-processing that converts it into A-label strings.
https://github.com/weppos/publicsuffix-go/blob/master/cmd/gen/gen.go#L60

I can check directly, but you may save me some time. Given that Chrome/Firefox are pre-processing the list, do you know if the data is stored in A-labels or U-labels?

gerv · 2016-10-28T15:14:12Z

weppos: if we are going to change the canonical representation, we should roll this change in with all the other format changes we want to make. Perhaps we need a wiki page to track them all? :-)

Firefox stores them as A-labels:
http://hg.mozilla.org/mozilla-central/file/default/netwerk/dns/prepare_tlds.py

sleevi · 2016-10-28T15:15:38Z

Chrome preprocesses the list and converts all U-Labels to A-Labels.

weppos · 2016-10-28T19:12:37Z

Perhaps we need a wiki page to track them all? :-)

That sounds like a plan @gerv. I'll start it, and post on the new PSL ml for discussion.

So it seems to me that A-labels seems really to be the best representation here. So I'm actually more convinced about pre-process the list into A-names, and switch it to use A-names by default as originally suggested by @jsha

Once done, I may provide U-label helpers.

Thanks for joining the discussion @sleevi and @gerv

kachkaev · 2016-11-04T22:03:43Z

Any updates or a roadmap on this by a chance? Just wondering as it seems that this issue makes it impossible to obtain let's encrypt certificates for certain IDN domains. Seems like thousands of users may be affected: letsencrypt/boulder#2277

If anyone knows a workaround, could you please share?

weppos · 2016-11-07T00:31:39Z

@kachkaev I've spent this week finally releasing a major version of my whois library that has been under development for over 1 year. I honestly didn't have a lot of time to work on a patch.

I'll try to work on it this week. Unless any big objection, it's likely I'll switch to A-labels by default. However, it's not a 10-minutes change as I have to put in place a procedure to pre-process the definition, as I definitely don't want to do it at runtime.

I have a script that pulls the PSL every day and auto-submit PR for changes, and that needs to be changed as well.

I'll post updates here, and I'll make sure to share the code in a branch as soon as ready, so that it's possible to start testing it against letsencrypt/boulder#2277 and letsencrypt/boulder#2278

pzb · 2016-11-07T00:40:53Z

I agree that A-labels are the way to go. Everytime I use one of @weppos' great libraries, I end up adding a check for the xn-- substring in the calling function and doing conversion there and then converting back. This is because the PSL is inherently tied to DNS, which uses A-labels. There is no need to guess how to encode 手表; if it s a TLD is it always xn--kpu716f.

As for IDNA2003 vs IDNA2008, isn't this solved by using A-labels? From what I can tell, the A-label to U-label conversion is the same for both, but the reverse may vary. If correct, then the only question is whether the IDNA2003 and IDNA2008 algorithms have different output for entries on the list, and the only answer the matters is what is in DNS.

slavonnet · 2016-11-14T09:24:04Z

example "рф". National russian domain!
https://xn----itbzecmx.xn--p1ai/

* Add test cases for GH-31 * Switch the list internal representation to A-labels by default * Pre-process the list to A-labels * Enforce official PSL test cases to ASCII * Add note about A-labels in the README

weppos · 2016-11-21T18:22:02Z

I want to personally thank everyone who provided feedback in this discussion, especially the folks at Let's Encrypt for reporting the issue, and @gerv @sleevi @pzb for providing their personal experience.

The issue is fixed by #40. The library now defaults to A-labels, and pre-process the PSL to make sure it is consistent with the expected internal representation.

New autopull requests will automatically provide the A-label form of any upstream PSL update.

weppos · 2016-11-21T18:22:43Z

FYI @pzb I'll see if I can replicate the same work on the publicsuffix-ruby lib.

cpu mentioned this issue Oct 26, 2016

PA rejects domains ending in IDN TLD letsencrypt/boulder#2277

Closed

weppos self-assigned this Oct 28, 2016

weppos added a commit that referenced this issue Nov 20, 2016

Add test cases for GH-31

4db7531

weppos added the enhancement label Nov 20, 2016

weppos mentioned this issue Nov 20, 2016

Switch the lib to A-labels by default #40

Merged

weppos closed this as completed Nov 21, 2016

weppos mentioned this issue Feb 17, 2018

Needs a processing model publicsuffix/list#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept A-labels #31

Accept A-labels #31

jsha commented Oct 24, 2016

pfigel commented Oct 24, 2016

weppos commented Oct 24, 2016

weppos commented Oct 24, 2016

jsha commented Oct 25, 2016

weppos commented Oct 25, 2016

jsha commented Oct 25, 2016

gerv commented Oct 26, 2016

sleevi commented Oct 26, 2016

gerv commented Oct 26, 2016

jsha commented Oct 26, 2016

sleevi commented Oct 26, 2016

gerv commented Oct 27, 2016

sleevi commented Oct 27, 2016

weppos commented Oct 28, 2016

gerv commented Oct 28, 2016

sleevi commented Oct 28, 2016 via email

weppos commented Oct 28, 2016

kachkaev commented Nov 4, 2016

weppos commented Nov 7, 2016

pzb commented Nov 7, 2016 •

edited

Loading

slavonnet commented Nov 14, 2016

weppos commented Nov 21, 2016

weppos commented Nov 21, 2016 •

edited

Loading

Accept A-labels #31

Accept A-labels #31

Comments

jsha commented Oct 24, 2016

pfigel commented Oct 24, 2016

weppos commented Oct 24, 2016

weppos commented Oct 24, 2016

jsha commented Oct 25, 2016

weppos commented Oct 25, 2016

jsha commented Oct 25, 2016

gerv commented Oct 26, 2016

sleevi commented Oct 26, 2016

gerv commented Oct 26, 2016

jsha commented Oct 26, 2016

sleevi commented Oct 26, 2016

gerv commented Oct 27, 2016

sleevi commented Oct 27, 2016

weppos commented Oct 28, 2016

gerv commented Oct 28, 2016

sleevi commented Oct 28, 2016 via email

weppos commented Oct 28, 2016

kachkaev commented Nov 4, 2016

weppos commented Nov 7, 2016

pzb commented Nov 7, 2016 • edited Loading

slavonnet commented Nov 14, 2016

weppos commented Nov 21, 2016

weppos commented Nov 21, 2016 • edited Loading

pzb commented Nov 7, 2016 •

edited

Loading

weppos commented Nov 21, 2016 •

edited

Loading