Inconsistent leading dot tests for "example.com" #208

bkirz · 2016-04-18T20:26:41Z

Hi! My team is building a publicsuffix library for Elixir and has run into an issue with the leading dot tests in the test file being inconsistent with the spec. The spec says "A domain or rule can be split into a list of labels using the separator '.' (dot). The separator is not part of any of the labels. Empty labels are not permitted, meaning that leading and trailing dots are ignored." This seems inconsistent with the following tests, which suggest that a domain with a leading dot should not match any rules:

checkPublicSuffix('.example.com', null);
checkPublicSuffix('.example.example', null);

and

checkPublicSuffix('example.example', 'example.example');

Is there an inconsistency here, or are we understanding the spec and/or tests incorrectly?

rockdaboot · 2016-04-19T07:42:09Z

The tests seem to be ok, but the spec is not really clear:

"Empty labels are not permitted" -> A leading or trailing dot implies an empty label, thus the check should return NULL.

"meaning that leading and trailing dots are ignored", well... ignored if you check for the string being a public suffix. But 'checkPublicSuffix()' does not do that. The right argument is the 'shortest private suffix' that can be constructed from the left argument. And in that the test should fail if there is a leading or trailing dot.

What we need is a test file with domains and a boolean value that says if the domain is a public suffix or not. IMO, there would be much less confusion.

BTW: AFAIR, there is tests.txt obsoleting test_psl.txt (just easier to parse).

weppos · 2016-04-19T09:32:05Z

@benkirzhner I think @rockdaboot essentially answered the question. Notice that in the first two lines, the second argument is null which essentially means "the input is invalid or not processable".

How you should specifically handle this, it's an implementation detail that is not enforced by the PSL guidelines. In my Ruby lib I perform some pre-validations and return an error.

PS. As @rockdaboot mentioned, I encourage you to use the new tests.txt file. I'll place a note on the old file, it will eventually be removed at some point.

The tests are currently the same, but the tests.txt file is a more language and implementation independent.

PSS. By coincidence (and I'm kind of shocked) I started working on an Elixir library a few days ago (I already developed a Ruby lib and Go lib. I'd be happy to contribute if you want, I definitely don't want to double the effort if you are at more advanced stage.

See publicsuffix/list#208 (comment).

bkirz · 2016-04-19T19:58:51Z

@rockdaboot:

The tests seem to be ok, but the spec is not really clear:

"meaning that leading and trailing dots are ignored", well... ignored if you check for the string being a public suffix. But 'checkPublicSuffix()' does not do that. The right argument is the 'shortest private suffix' that can be constructed from the left argument. And in that the test should fail if there is a leading or trailing dot.

Since the tested behavior appears to be what's supported by existing libraries, we'll change our implementation to conform. Still, the spec's formal algorithm doesn't seem unclear or ambiguous:

"Here is an algorithm for determining the Public Suffix of a domain."
"A domain or rule can be split into a list of labels using the separator "." (dot). The separator is not part of any of the labels. Empty labels are not permitted, meaning that leading and trailing dots are ignored."

"...leading and trailing dots are ignored" sounds like it means that .foo.bar.com should be treated the same as foo.bar.com. I don't know how to read it any other way. Can the language in the spec be changed to improve the clarity?

What we need is a test file with domains and a boolean value that says if the domain is a public suffix or not. IMO, there would be much less confusion.

This sounds like a great idea in addition to the existing tests file. The business reason we have for building the Elixir library is so we can group URLs by shortest private suffix, and having explicit test cases for that behavior is useful.

@weppos:

PS. As @rockdaboot mentioned, I encourage you to use the new tests.txt file. I'll place a note on the old file, it will eventually be removed at some point.

Thanks for the link; we've already switched to the new one. FYI, the website currently links to the old test data file. Are there plans to update the website to link to the new file?

PSS. By coincidence (and I'm kind of shocked) I started working on an Elixir library a few days ago (I already developed a Ruby lib and Go lib. I'd be happy to contribute if you want, I definitely don't want to double the effort if you are at more advanced stage.

Our project is currently a private repo as we build out the initial implementation and as we going through the process of making it public. Once we get the thumbs up to open source, we'd be happy to coordinate and add you as a contributor.

weppos · 2016-04-22T11:34:28Z

"...leading and trailing dots are ignored" sounds like it means that .foo.bar.com should be treated the same as foo.bar.com. I don't know how to read it any other way. Can the language in the spec be changed to improve the clarity?

We have to distinguish between a rule defined as .foo.bar.com and an input passed as .foo.bar.com.

A rule cannot contain trailing dots. We have a test suite that ensures we don't incorporate such kind of rules by mistake. An input is, of course, dependent by the application itself. The constraint/suggestion is that you should pass an input without traling and leading dots.

I will be happy to try to reformulate the docs.

What we need is a test file with domains and a boolean value that says if the domain is a public suffix or not. IMO, there would be much less confusion.

This sounds like a great idea in addition to the existing tests file. The business reason we have for building the Elixir library is so we can group URLs by shortest private suffix, and having explicit test cases for that behavior is useful.

@rockdaboot @benkirzhner can you elaborate? For the purpose of the algorithm, there is no difference between a private domain or a registry suffix. The process is the same.

The semantic is currently assigned by the position in the list.

FYI, the website currently links to the old test data file. Are there plans to update the website to link to the new file?

Thanks, I'll update it.

FYI, the website currently links to the old test data file. Are there plans to update the website to link to the new file?

👍

rockdaboot · 2016-04-22T15:02:55Z

@benkirzhner @weppos

What we need is a test file with domains and a boolean value that says if the domain is a public suffix or not. IMO, there would be much less confusion.

This sounds like a great idea in addition to the existing tests file. The business reason we have for building the Elixir library is so we can group URLs by shortest private suffix, and having explicit test cases for that behavior is useful.

libpsl uses [1] a hand-selected list of tests and [2] auto-generated tests from the PSL itself.

While we could put the test data from [1] into a new file + plus your suggestions, [2] maybe should stay more of an algorithm !? I also could provide a script to auto-generate a file with all those tests from [2]. WDYT ?

[1] test-is-public.c
[2] test-is-public-all.c

P.S.: the algorithm of [2] in short (not all yet implemented):

take each rule frompublic_suffix_list.dat
if rule is an exception:
- domain is NOT be a PS
- random replacing the left label IS a PS
- domain without left label is a PS (there should be a wildcard matching rule) !?
if rule is a wildcard:
- domain is a PS
- random left label (instead of star) IS a PS
if rule is regular:
- domain is a PS
  And all that with respect to the section (ICANN/PRIVATE)

WDYT ?

weppos · 2016-04-27T09:34:06Z

@rockdaboot can you provide a small example (a few lines) of how the test file would look like?

rockdaboot · 2016-04-27T10:17:45Z

I thought of a very simple format.

Comments have '#' as first character.
Empty lines allowed.
Everything else should be

domain is_psl, with is_psl = 0|1

We could think about leading/trailing dots for domain, I included one example with a dot.
Empty domain, NULL domain, etc. should IMO be in the responsibility of the implementor's test suite.

# this is a comment
www.example.com 0
com.ar 1
.com.ar 1

# exception from *.ck
www.ck 0 

# unknown TLD
adfhoweirh 1

wdhdev · 2024-11-29T09:14:56Z

@simon-friedberger I think this can be closed, it is a very old issue. Unless this is still a prominent issue.

dnsguru · 2024-11-29T11:58:58Z

Agree +1 to closing

myronmarston added a commit to seomoz/publicsuffix-elixir that referenced this issue Apr 19, 2016

Update to new tests file.

b224686

See publicsuffix/list#208 (comment).

myronmarston mentioned this issue Apr 19, 2016

Update tests seomoz/publicsuffix-elixir#7

Merged

weppos mentioned this issue Apr 22, 2016

Host the site on GitHub #28

Closed

peterthomassen mentioned this issue Jun 11, 2019

Clarify number and position of wildcard labels #145

Closed

dnsguru mentioned this issue Apr 9, 2020

Could we use project board method to separate list update PRs from administrata #1008

Closed

alevesely mentioned this issue May 14, 2021

What list are test_psl.txt and tests.txt based upon? #1321

Closed

dnsguru closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent leading dot tests for "example.com" #208

Inconsistent leading dot tests for "example.com" #208

bkirz commented Apr 18, 2016

rockdaboot commented Apr 19, 2016

weppos commented Apr 19, 2016

bkirz commented Apr 19, 2016 •

edited

Loading

weppos commented Apr 22, 2016

rockdaboot commented Apr 22, 2016

weppos commented Apr 27, 2016

rockdaboot commented Apr 27, 2016

wdhdev commented Nov 29, 2024 •

edited

Loading

dnsguru commented Nov 29, 2024

Inconsistent leading dot tests for "example.com" #208

Inconsistent leading dot tests for "example.com" #208

Comments

bkirz commented Apr 18, 2016

rockdaboot commented Apr 19, 2016

weppos commented Apr 19, 2016

bkirz commented Apr 19, 2016 • edited Loading

weppos commented Apr 22, 2016

rockdaboot commented Apr 22, 2016

weppos commented Apr 27, 2016

rockdaboot commented Apr 27, 2016

wdhdev commented Nov 29, 2024 • edited Loading

dnsguru commented Nov 29, 2024

bkirz commented Apr 19, 2016 •

edited

Loading

wdhdev commented Nov 29, 2024 •

edited

Loading