Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pages.example.com suffix declaration causes get_tld('example.com') == 'example.com' #18

Open
jmehnle opened this issue Jul 15, 2020 · 6 comments

Comments

@jmehnle
Copy link

jmehnle commented Jul 15, 2020

publicsuffix2 mishandles the case where, given the declaration of some public suffix, all suffixes of that suffix are seen as their own TLDs. E.g., given the declaration of git-pages.rit.edu as a public suffix, get_tld('rit.edu') returns 'rit.edu', whereas it really should return 'edu':

Python 3.7.7 (default, Mar 14 2020, 02:39:38)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from publicsuffix2 import PublicSuffixList
>>> psl = PublicSuffixList()
>>> psl.get_tld("foo.git-pages.rit.edu")
'git-pages.rit.edu'  # CORRECT
>>> psl.get_tld("git-pages.rit.edu")
'git-pages.rit.edu'  # WRONG, should be 'edu'
>>> psl.get_tld("rit.edu")
'rit.edu'            # WRONG, should be 'edu'
>>> psl.get_tld("edu")
'edu'                # CORRECT, but probably out of accident
@jmehnle
Copy link
Author

jmehnle commented Jul 15, 2020

I tried to understand the _lookup_node method and fix the issue to create a PR, but haven't been successful in the limited time I have right now.

@pombredanne
Copy link
Collaborator

pombredanne commented Jul 16, 2020

@jmehnle Thanks for the report ! @hiratara @KnitCode what's your take on this case?
Here the PSL has this entry:

// Rochester Institute of Technology : http://www.rit.edu/
// Submitted by Jennifer Herting <jchits@rit.edu>
git-pages.rit.edu

@jmehnle
Copy link
Author

jmehnle commented Jul 16, 2020

Just to be clear, this problem is more general than just the git-pages.rit.edu suffix. It will happen with any suffix (here: git-pages.rit.edu) that is an indirect subdomain of another suffix (here: edu): any intermediate domains (here: rit.edu) will erroneously be recognized as their own TLD when really that other suffix (here: edu) should be returned as the TLD instead.

@hiratara
Copy link
Contributor

I think it's my fault. I made the same mistake with the rust library.

rushmorem/publicsuffix#24

$ python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from publicsuffix2 import PublicSuffixList
>>> psl = PublicSuffixList()
>>> psl.get_tld("cdn.fbsbx.com")
'fbsbx.com'    # WRONG

psl.get_tld("git-pages.rit.edu")
'git-pages.rit.edu' # WRONG, should be 'edu'

I believe this behavior is correct. psl produces the same result.

$ python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psl
>>> psl.domain_suffixes("git-pages.rit.edu").public
'git-pages.rit.edu'
>>> psl.domain_suffixes("rit.edu").public
'edu'
>>> psl.domain_suffixes("edu").public
'edu'
>>> psl.domain_suffixes("cdn.fbsbx.com").public
'com'

@hiratara
Copy link
Contributor

We also have to consider with platform.sh problem .

This ticket insists that the publicsuffix of rit.edu should be edu, and I am. So what should the publicsuffix of kobe.jp be? Our test insists that it should be "kobe.jp".

Here is the result of psl:

>>> psl.domain_suffixes("kobe.jp").public
'jp'
>>> psl.domain_suffixes("x.kobe.jp").public
'x.kobe.jp'
>>> psl.domain_suffixes("city.kobe.jp").public
'kobe.jp'

I think it's a good idea to make the same result as the psl.

@hiratara
Copy link
Contributor

I`m trying to fix the issue with #19 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants