Clarification of `uk.com` test #864

Shardj · 2019-08-14T14:27:06Z

The test checkPublicSuffix('uk.com', null); seems like an interesting one. If I follow through the algorithm provided at https://publicsuffix.org/list/ which I've pasted below for convenience. Then I come to the conclusion that the suffix for uk.com is actually com and it shouldn't be null.

Match domain against all rules and take note of the matching ones.

If no rules match, the prevailing rule is "*".

If more than one rule matches, the prevailing rule is the one which is an exception rule.

If there is no matching exception rule, the prevailing rule is the one with the most labels.

If the prevailing rule is a exception rule, modify it by removing the leftmost label.

The public suffix is the set of labels from the domain which match the labels of the prevailing rule, using the matching algorithm above.

The registered or registrable domain is the public suffix plus one additional label.

It seems odd that the test says it should be null, after all https://uk.com is a perfectly real and valid website where uk.com is a valid domain. There seems to be an unspoken rule that the tld list considers any domains that 100% match a tld to be invalid, however many real world examples don't reflect this rule:

nhs.uk
platform.sh
s3.amazon.com

Just like uk.com, individually these are both valid urls, but they are also valid suffixes. I've been told by someone before that cases like this aren't valid because "A public suffix can not be resolved", however I've found no evidence for this on the publicsuffix.org website.

Thanks for any help

Edit: please ignore my past self mistakenly thinking the algorithm in it's current state should match uk.com to com instead of null. The issue raised here was simply trying to understand why the given domain examples couldn't have a suffix determined and instead got null back.

The text was updated successfully, but these errors were encountered:

sleevi · 2019-08-14T17:07:47Z

I’m not entirely sure how to parse your distinction between “valid rule” and “valid suffix”, but one of the key distinctions for the ICANN vs PRIVATE section is a prohibition (in implementing clients, mind you) against resolving / connecting to the former, while allowing the latter.

Shardj · 2019-08-19T13:16:51Z

Sorry but I'm not sure what rules you're referring to, the only rule I mentioned was me trying to understand why uk.com doesn't have the suffix of com in the provided test? By valid suffix I meant that they're in the public suffix list, but in this case they're also real urls.

Could you link me to somewhere I can read about this distinction or clarify what you meant? I don't quite understand what you mean. If we're making distinctions between icann and private suffixes though, is it helpful to add that nhs.uk is an ICANN suffix while uk.com is in the PRIVATE section?

sleevi · 2019-08-19T18:21:28Z

See "Divisions" at https://publicsuffix.org/list/

Shardj · 2019-08-20T09:08:55Z

That makes sense, I'm still not sure what you meant earlier though, could you reword it?

Shardj · 2020-08-06T15:32:58Z

@sleevi it's been a year so I figured I'd come back and see if I can get any further help with this. You said something about prohibiting the icann list when allowing the private list. In which case are you saying that the unit test is correct to come back with null if using both the icann and private sections at the same time. But you're saying that in an actual implementation where you want to find out the tld of uk.com you should resolve to the private list only if you fail to get a match against both sections at the same time.

Don't you have the opposite problem from the url s3.amazonaws.com which will resolve as null against both lists at the same time, and it'll also resolve null against the private secton by itself. It'll resolve correctly against the icann section though.

So to get the following unit tests passing would this code be correct?

Unit tests:

checkPublicSuffix('uk.com', 'com');
checkPublicSuffix('s3.amazonaws.com', 'com');
checkPublicSuffix('platform.sh', 'sh');

Pseudo Code:

result = attemptResolveAgainstBothSections
if (result == null) {
  result = attemptResolveAgainstPrivate
}
if (result == null {
  result = attemptResolveAgainstICANN
}
return result

dnsguru · 2020-08-06T16:35:24Z

@Shardj

I've been told by someone before that cases like this aren't valid because "A public suffix can not be resolved", however I've found no evidence for this on the publicsuffix.org website.

Are you trying to check if 'someone' is right or wrong about this? PSL has zero to do with resolution, DNS does that.

PSL would impact cookie horizons, or other domain name logic within the application layer post-resolution.

In reviewing the three submissions above, each of them appear to be working as submitted by their respective operators.

It is unclear what the objective of this request is, as it does not appear to be attempting to fix what is not broken.

We're volunteers that are under extra burdens with the global pandemic situation, can you provide suggested remedy for the wording that bridges this understanding gap in a manner that makes it more clear, or can we close this please?

Shardj · 2020-08-06T16:59:23Z

No that's just a quote from someone who was trying to explain to me that a public suffix cannot also be a valid domain, they were very insistent which confused me a fair bit as the example urls I provided all resolve just fine. That's the only reason I mentioned it.

My use case is that a bunch of urls are going through our system, and we want to know the root domain, so to know that we go through and determine the domains suffix, then we know the root domain is the suffix plus one more section on the left. Simply put, currently when urls such as nhs.uk, uk.com, platform.sh and s3.amazonaws.com get into our system it messes things up a bit as our code can't figure out the suffix, they determine as null. This results in our system assuming they're dodgy domains and incorrectly throwing them out. The code for determining the domain suffix is following the algorithm laid out at https://publicsuffix.org/list/

The objective I'm trying to reach is to fix the above issue; to stop the aforementioned domains from being considered invalid by our system due to the inability to determine their suffix. To do that I'm trying to understand why uk.com is supposed to have the suffix determined as null instead of com. After all com is the real tld of the uk.com domain if you resolve the url. It's either a misunderstanding on my part as to how this algorithm should be used, or it's a flaw in the algorithm. Most likely the former but that's why I'm here.

If you're too busy to address this at the moment feel free to leave it alone for a while and I'll check back another time.

sleevi · 2020-08-08T04:52:23Z

@Shardj Ultimately, how you use the list is up to you. If you're trying to determine the "root URL", you're probably doing something wrong, honestly. Some of that is covered at https://github.com/sleevi/psl-problems

The algorithm is correct for what its returning, relative to cookies, and even then, there are edge cases. This sounds like you're running into the same situation discussed at #91

I'm not sure who said a public suffix cannot be a valid domain, because that's not the case either, obviously.

Shardj · 2020-08-11T10:22:16Z

@sleevi Well the algorithm is described as "an algorithm for determining the Public Suffix of a domain" which is exactly what I want to use it for, since knowing the suffix means I also know the root domain. So if I'm using it wrong then I'm not sure what the right way is. Thanks for the link by the way, that's a very good read.

Yeah it does seem like a similar issue to #91, doesn't this just indicate that the algorithm is flawed though? After all it's coming out with null values when the suffix should be easy enough to determine. Then again, I suppose from a 'cookie' use case, a null value isn't an issue since it simply indicates that no other domains can ever access cookies for the given domain. So I suppose my issue comes down to using the algorithm for something other than it's originally intended purpose.

Although if the algorithm was to be modified by simply adding this step to the algorithm, "ignore the left most label/part of the domain for the following steps when matching rules against the domain", then we'd never run into this problem as we can never end up matching the whole domain as the suffix. uk.com would have it's suffix determined as com instead of null without issue.

I'm happy to close this as I can understand now that the algorithm described on the public suffix site never needed to find the suffix of uk.com as there are no possible subdomains that a cookie could be shared with anyway, so it doesn't matter if it comes back with null or com either way.

sleevi · 2020-08-11T23:02:51Z

since knowing the suffix means I also know the root domain.

I don't think this means what you want. In fact, I'm not sure what you want, but that sounds wrong for most usages.

After all it's coming out with null values when the suffix should be easy enough to determine.

A null value is expected.

So I suppose my issue comes down to using the algorithm for something other than it's originally intended purpose.

Yes, that's roughly where I was going to this.

uk.com would have it's suffix determined as uk instead of null without issue.

Did you typo that? I don't believe any possible interpretation would be correct there.

Shardj · 2020-08-17T11:36:18Z

I don't think this means what you want

Finding the root domain from a domain by determining it's suffix and adding on the next label from the left seems to be the only way unless you go and resolve the url, but that would be too slow. It's even described in the algorithms final step, "The registered or registrable domain is the public suffix plus one additional label."

A null value is expected

Yeah, but null isn't the suffix is my point, and determining the suffix is what the algorithm is understood to do. But yeah we just come back to the fact that the algorithm doesn't need to do that in this case since cookies from the given domain can't ever be shared to other domains so it just spits out null. I think we're on the same page here.

Did you typo that?

Yeah typo'd that, should've said suffix determined as com, I'll edit the original

Shardj · 2020-08-17T11:52:42Z

Closing it because my original question was solved

dnsguru mentioned this issue Apr 9, 2020

Could we use project board method to separate list update PRs from administrata #1008

Closed

Shardj closed this as completed Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of `uk.com` test #864

Clarification of `uk.com` test #864

Shardj commented Aug 14, 2019 •

edited

Loading

sleevi commented Aug 14, 2019 via email

Shardj commented Aug 19, 2019 •

edited

Loading

sleevi commented Aug 19, 2019

Shardj commented Aug 20, 2019

Shardj commented Aug 6, 2020 •

edited

Loading

dnsguru commented Aug 6, 2020

Shardj commented Aug 6, 2020 •

edited

Loading

sleevi commented Aug 8, 2020

Shardj commented Aug 11, 2020 •

edited

Loading

sleevi commented Aug 11, 2020

Shardj commented Aug 17, 2020

Shardj commented Aug 17, 2020

Clarification of uk.com test #864

Clarification of uk.com test #864

Comments

Shardj commented Aug 14, 2019 • edited Loading

sleevi commented Aug 14, 2019 via email

Shardj commented Aug 19, 2019 • edited Loading

sleevi commented Aug 19, 2019

Shardj commented Aug 20, 2019

Shardj commented Aug 6, 2020 • edited Loading

dnsguru commented Aug 6, 2020

Shardj commented Aug 6, 2020 • edited Loading

sleevi commented Aug 8, 2020

Shardj commented Aug 11, 2020 • edited Loading

sleevi commented Aug 11, 2020

Shardj commented Aug 17, 2020

Shardj commented Aug 17, 2020

Clarification of `uk.com` test #864

Clarification of `uk.com` test #864

Shardj commented Aug 14, 2019 •

edited

Loading

Shardj commented Aug 19, 2019 •

edited

Loading

Shardj commented Aug 6, 2020 •

edited

Loading

Shardj commented Aug 6, 2020 •

edited

Loading

Shardj commented Aug 11, 2020 •

edited

Loading