-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification of uk.com
test
#864
Comments
I’m not entirely sure how to parse your distinction between “valid rule”
and “valid suffix”, but one of the key distinctions for the ICANN vs
PRIVATE section is a prohibition (in implementing clients, mind you)
against resolving / connecting to the former, while allowing the latter.
|
Sorry but I'm not sure what rules you're referring to, the only rule I mentioned was me trying to understand why uk.com doesn't have the suffix of com in the provided test? By valid suffix I meant that they're in the public suffix list, but in this case they're also real urls. Could you link me to somewhere I can read about this distinction or clarify what you meant? I don't quite understand what you mean. If we're making distinctions between icann and private suffixes though, is it helpful to add that nhs.uk is an ICANN suffix while uk.com is in the PRIVATE section? |
See "Divisions" at https://publicsuffix.org/list/ |
That makes sense, I'm still not sure what you meant earlier though, could you reword it? |
@sleevi it's been a year so I figured I'd come back and see if I can get any further help with this. You said something about prohibiting the icann list when allowing the private list. In which case are you saying that the unit test is correct to come back with null if using both the icann and private sections at the same time. But you're saying that in an actual implementation where you want to find out the tld of Don't you have the opposite problem from the url So to get the following unit tests passing would this code be correct? Unit tests:
Pseudo Code:
|
Are you trying to check if 'someone' is right or wrong about this? PSL has zero to do with resolution, DNS does that. PSL would impact cookie horizons, or other domain name logic within the application layer post-resolution. In reviewing the three submissions above, each of them appear to be working as submitted by their respective operators. It is unclear what the objective of this request is, as it does not appear to be attempting to fix what is not broken. We're volunteers that are under extra burdens with the global pandemic situation, can you provide suggested remedy for the wording that bridges this understanding gap in a manner that makes it more clear, or can we close this please? |
No that's just a quote from someone who was trying to explain to me that a public suffix cannot also be a valid domain, they were very insistent which confused me a fair bit as the example urls I provided all resolve just fine. That's the only reason I mentioned it. My use case is that a bunch of urls are going through our system, and we want to know the root domain, so to know that we go through and determine the domains suffix, then we know the root domain is the suffix plus one more section on the left. Simply put, currently when urls such as nhs.uk, uk.com, platform.sh and s3.amazonaws.com get into our system it messes things up a bit as our code can't figure out the suffix, they determine as null. This results in our system assuming they're dodgy domains and incorrectly throwing them out. The code for determining the domain suffix is following the algorithm laid out at https://publicsuffix.org/list/ The objective I'm trying to reach is to fix the above issue; to stop the aforementioned domains from being considered invalid by our system due to the inability to determine their suffix. To do that I'm trying to understand why uk.com is supposed to have the suffix determined as null instead of com. After all com is the real tld of the uk.com domain if you resolve the url. It's either a misunderstanding on my part as to how this algorithm should be used, or it's a flaw in the algorithm. Most likely the former but that's why I'm here. If you're too busy to address this at the moment feel free to leave it alone for a while and I'll check back another time. |
@Shardj Ultimately, how you use the list is up to you. If you're trying to determine the "root URL", you're probably doing something wrong, honestly. Some of that is covered at https://github.com/sleevi/psl-problems The algorithm is correct for what its returning, relative to cookies, and even then, there are edge cases. This sounds like you're running into the same situation discussed at #91 I'm not sure who said a public suffix cannot be a valid domain, because that's not the case either, obviously. |
@sleevi Well the algorithm is described as "an algorithm for determining the Public Suffix of a domain" which is exactly what I want to use it for, since knowing the suffix means I also know the root domain. So if I'm using it wrong then I'm not sure what the right way is. Thanks for the link by the way, that's a very good read. Yeah it does seem like a similar issue to #91, doesn't this just indicate that the algorithm is flawed though? After all it's coming out with null values when the suffix should be easy enough to determine. Then again, I suppose from a 'cookie' use case, a null value isn't an issue since it simply indicates that no other domains can ever access cookies for the given domain. So I suppose my issue comes down to using the algorithm for something other than it's originally intended purpose. Although if the algorithm was to be modified by simply adding this step to the algorithm, "ignore the left most label/part of the domain for the following steps when matching rules against the domain", then we'd never run into this problem as we can never end up matching the whole domain as the suffix. uk.com would have it's suffix determined as com instead of null without issue. I'm happy to close this as I can understand now that the algorithm described on the public suffix site never needed to find the suffix of uk.com as there are no possible subdomains that a cookie could be shared with anyway, so it doesn't matter if it comes back with null or com either way. |
I don't think this means what you want. In fact, I'm not sure what you want, but that sounds wrong for most usages.
A null value is expected.
Yes, that's roughly where I was going to this.
Did you typo that? I don't believe any possible interpretation would be correct there. |
Finding the root domain from a domain by determining it's suffix and adding on the next label from the left seems to be the only way unless you go and resolve the url, but that would be too slow. It's even described in the algorithms final step, "The registered or registrable domain is the public suffix plus one additional label."
Yeah, but null isn't the suffix is my point, and determining the suffix is what the algorithm is understood to do. But yeah we just come back to the fact that the algorithm doesn't need to do that in this case since cookies from the given domain can't ever be shared to other domains so it just spits out null. I think we're on the same page here.
Yeah typo'd that, should've said suffix determined as com, I'll edit the original |
Closing it because my original question was solved |
The test
checkPublicSuffix('uk.com', null);
seems like an interesting one. If I follow through the algorithm provided at https://publicsuffix.org/list/ which I've pasted below for convenience. Then I come to the conclusion that the suffix for uk.com is actually com and it shouldn't be null.It seems odd that the test says it should be null, after all https://uk.com is a perfectly real and valid website where uk.com is a valid domain. There seems to be an unspoken rule that the tld list considers any domains that 100% match a tld to be invalid, however many real world examples don't reflect this rule:
Just like uk.com, individually these are both valid urls, but they are also valid suffixes. I've been told by someone before that cases like this aren't valid because "A public suffix can not be resolved", however I've found no evidence for this on the publicsuffix.org website.
Thanks for any help
Edit: please ignore my past self mistakenly thinking the algorithm in it's current state should match uk.com to com instead of null. The issue raised here was simply trying to understand why the given domain examples couldn't have a suffix determined and instead got null back.
The text was updated successfully, but these errors were encountered: