Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Side effect of "default rule" on disallowed TLDs #570

Closed
weppos opened this issue Dec 4, 2017 · 5 comments
Closed

Side effect of "default rule" on disallowed TLDs #570

weppos opened this issue Dec 4, 2017 · 5 comments
Assignees
Labels
❔❔ question Open question, please look / answer / respond

Comments

@weppos
Copy link
Member

weppos commented Dec 4, 2017

While reviewing golang/go#22959 I noticed a potentially unwanted side effect that I was able to reproduce at least in my Go and Ruby libraries.

I would be interested to know from @gerv @sleevi @rockdaboot if you have any feedback and if you believe this is an implementation issue, or a list issue.

Here's the issue: when a TLD is disallowed but instead we list third-level zones, the If no rules match, the prevailing rule is "*" instruction (see Algorithm) would cause incorrect results.**

Here's an example:

// za : http://www.zadna.org.za/content/page/domain-information
ac.za
agric.za
alt.za
co.za
edu.za
gov.za

Note .za is not present as disallowed. When searching for e.g. for gli.za, according to the algorithm the lookup would return no rule, hence it fallbacks to *. As a result, gli.za would translate into being the domains. Likewise, if you supply foo.gli.za, the domain would be gli.za and subdomain foo.

Of course, this doesn't apply if you try to parse any name containing one of the other explicitly hosted suffixes:

irb(main):007:0> PublicSuffix.parse("gli.za")
=> #<PublicSuffix::Domain:0x00007fa6bbb63758 @tld="za", @sld="gli", @trd=nil>
irb(main):008:0> PublicSuffix.parse("foo.gli.za")
=> #<PublicSuffix::Domain:0x00007fa6bbb58510 @tld="za", @sld="gli", @trd="foo">
irb(main):009:0> PublicSuffix.parse("ac.za")
PublicSuffix::DomainNotAllowed: `ac.za` is not allowed according to Registry policy
	from /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix.rb:78:in `parse'
	from (irb):9
	from bin/console:14:in `<main>'
irb(main):010:0> PublicSuffix.parse("foo.ac.za")
=> #<PublicSuffix::Domain:0x00007fa6bbb52390 @tld="ac.za", @sld="foo", @trd=nil>

In other words, we are missing a way to tell the parser that .za is not a valid suffix. Actually, we do have a way which is the wildcard. In fact, the following change would word potentially tell the parser that .za is invalid (as it will find a match according to the suggested algorithm, but it will disallow the name because of the wildcard).

*.za
ac.za
agric.za
  publicsuffix-ruby git:(master)  bin/console
irb(main):001:0> PublicSuffix.parse("gli.za")
PublicSuffix::DomainNotAllowed: `gli.za` is not allowed according to Registry policy
	from /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix.rb:78:in `parse'
	from (irb):1
	from bin/console:14:in `<main>'
irb(main):002:0> PublicSuffix.parse("foo.gli.za")
=> #<PublicSuffix::Domain:0x00007fb599a5dee8 @tld="gli.za", @sld="foo", @trd=nil>

However, this is a sort of hack, with the side effect of causing gli.za to be considered a valid suffix despite not being explicitly listed. In the other words, we solve the problem with .za, but we create another potential parsing issue.

Funny enough, adding !za seems to change the behavior as expected, at least in my implementations. This is totally unexpected (we document ! to be an exception to a wildcard), hence I'm not sure whether this should be the solution unless we document it as such,

// za : http://www.zadna.org.za/content/page/domain-information
!za
ac.za
agric.za
alt.za
➜  publicsuffix-ruby git:(master) ✗ bin/console
irb(main):001:0> PublicSuffix.parse("foo.gli.za")
PublicSuffix::DomainNotAllowed: `foo.gli.za` is not allowed according to Registry policy
	from /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix.rb:78:in `parse'
	from (irb):1
	from bin/console:14:in `<main>'
irb(main):002:0> PublicSuffix.parse("gli.za")
PublicSuffix::DomainNotAllowed: `gli.za` is not allowed according to Registry policy
	from /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix.rb:78:in `parse'
	from (irb):2
	from bin/console:14:in `<main>'
irb(main):003:0> PublicSuffix.parse("za")
PublicSuffix::DomainNotAllowed: `za` is not allowed according to Registry policy
	from /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix.rb:78:in `parse'
	from (irb):3
	from bin/console:14:in `<main>'
irb(main):004:0> PublicSuffix.parse("foo.ac.za")
=> #<PublicSuffix::Domain:0x00007fba4d8619b0 @tld="ac.za", @sld="foo", @trd=nil>

Note that if you ignore the * rule, then you are safe. Which is also a possible solution, as well as we document it properly. In fact, my implementations allow to skip the default rule so that you can exacly spot these cases. However, I recall this is not an option in some implementations (e.g. I recall a discussion of @rockdaboot with Daniel from curl as libpsl doesn't seem to allow it).


Long story short, I am accepting feedback about what we should change/document to make it more clear that in cases where we explicitly list all the third-level suffixes, we don't have the second level to be considered a valid suffix by mistake.

@weppos weppos added the ❔❔ question Open question, please look / answer / respond label Dec 4, 2017
@weppos weppos self-assigned this Dec 4, 2017
@rockdaboot
Copy link
Contributor

rockdaboot commented Dec 4, 2017

So you think .za is not a public suffix ? IMO, it is a public suffix and everything works as expected. Not having .za explicitly in the list wouldn't hurt (as you say: star rule). Maybe the .za rule should just be added or what do I miss ?
But if you really need exceptions for a single TLD (to not be a public suffix), then we have to talk about it. AFAIR, the curl stuff was (between the lines) about detecting whether a TLD officially exists or not. I am pretty sure that the PSL is not for that purpose.

@weppos
Copy link
Member Author

weppos commented Dec 4, 2017

Hi Tim!

Maybe the .za rule should just be added or what do I miss ?

The .ZA suffix is not listed because it's not a registerable suffix. It would be there if it would be possible to register a name at the .ZA TLD. But the registry doesn't allow it.

At the time being, this behavior is not properly reflected in the list. Assuming this is something we care.

@gerv
Copy link
Contributor

gerv commented Dec 5, 2017

On 04/12/17 18:08, Simone Carletti wrote:

Note |.za| is not present as disallowed. When searching for e.g. for
|gli.za|, according to the algorithm the lookup would return no rule,
hence it fallbacks to |*|. As a result, |gli.za| would translate into
being the domains.

But does gli.za actually occur in the DNS? Presumably not, as you say the .za registry does not allow it.

So really, the question here is: do we support the use case of using the public suffix list on arbitrary inputs, or do we only support the use case of determining the public suffix for a domain which is in the DNS?

I know that some applications like to use the PSL to determine what is a domain and what is not. Chrome does this, I believe. I'm not sure if Firefox does (yet). Those uses would break in this scenario.

@rockdaboot
Copy link
Contributor

The .ZA suffix is not listed because it's not a registerable suffix.

.com is also not registrable but listed. That's the star rule in effect - it doesn't matter if you list a TLD or not.

You can register gli.za or x.gli.za but not za itself.
You can register gli.com or x.gli.com but not com itself.

And with the ca.za rule:
You cannot register ac.za (it's a PS) but you can register x.ac.za.

libpsl has a tool called 'psl'. You can check / play with

  --print-unreg-domain         print the longest public suffix part
  --print-reg-domain           print the shortest private suffix part

@sleevi
Copy link
Contributor

sleevi commented Dec 6, 2017

@rockdaboot I think that's conflating two things

.com is a registerable suffix - e.g. I can get sleevi.com under .com - and so .com is listed
.za is not a registerable suffix - the only names that can be obtained are under the set of enumerated SLDs under .za

I see @weppos mentioned it, but explicitly from the .za registry page (link in original)

domain names can only be registered under an SLD

Part of this problem stems from the PSL being unioned with the ICANN Root Zone Database. If we wish to express the TLD-ness of something, having .za in there explicitly is valuable - as it is a ccTLD.

I think @weppos 's root issue comes down to uncertainty as to whether the PSL is meant to be a necessarily full reflection of the domain registration policies - that is, all that are valid 2LDs - or whether it's meant to be sufficient expression for clients to ensure cookies are appropriately scoped.

I think I would prefer the latter, and thus an expression such as adding just .za (and allowing for more specific policies for the SLDs) is far more appropriate, and if not that, then *.za (for future compatibility of introducing new SLDs, at the cost of not being "IANA/ICANN-complete") is also reasonable. I don't think !za would be consistent with how we've handled other domains, and I would particularly worry about the implications to the future maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❔❔ question Open question, please look / answer / respond
Projects
None yet
Development

No branches or pull requests

4 participants