-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New test suite & linter #123
Conversation
- non-lowercase - leading dots - spaces It should prevent issues like #39 and https://bugzilla.mozilla.org/show_bug.cgi?id=985003
The PSL checks of libpsl are kind of a by-product when building it. Building takes a full autoreconf, ./configure, make, make check run which never can beat a script language test. libpsl (make check) performs the tests in test_psl.txt, but also tests each entry in the PSL to be found via a checkPublicSuffix() function (or not to be found if is an exception). Wildcards are taken care of. But I think that is beyond a syntax checker / linter. I am not Ruby affine, so can't easily see what is already implemented and what not. So please if I list some checks that already exist (just spontaneous ideas):
|
Thanks for your feedback @rockdaboot!
This is interesting. I've seen the same in the
It currently checks for:
I'm not currently checking for UTF-8 but I can try to add it.
Good point, I'll add a check.
I currently check only for a trailing dot, but you made a very good point here.
That's reasonable.
This is a little bit more tricky. I can definitely add it later on. Thanks for your feedback @rockdaboot, that's really valuable. One more thing. The current design is fail-fast. I run a bunch of tests, and the first entry that fails is reported. Does it make sense, or should the tool return all the issues with all the occurrences? The current approach is a little bit faster, and it's also very test-oriented. The second approach would be closer to a linter, and potentially reusable to validate any PSL (whereas the current one is very specific to this repo). |
I am sure it is the entry '..githubcloudusercontent.com'. This is the first wildcard with two 'star' labels - libpsl code is not ready to deal with it - but it should be a quick fix (hopefully within the next days).
For a test tool fail-fast makes sense, especially when the test data (PSL) evolves slowly - as it does. |
I supposed that was the cause. Apparently, we never had a double wildcard so far in the list. However, according to the PSL definition, it seems to be a valid rule. From this page:
I'll open a separate ticket for the double wildcard rule, so that we can get feedback from @gerv and @sleevi if it doesn't break any existing implementation, and we can keep track of the related changes in the various libs.
Thanks for your feedback. I hope to collect more feedback in the next days. I'm already working to think about a possible design for a linter, as I agree that it can be a desirable feature. |
@sleevi asked me to use a DAFSA implementation as in Chromium (libpsl issue #35). This is a dictionary lookup that takes a domain as input and returns a 'flags' value. One of these flags is 'WILDCARD'. My conclusion is, that the Chromium implementation just works with one single kind of wildcard, namely '.foo', not '.*.foo'. It seems to me that a change is non-trivial. So IMO, using a double (or multiple) wildcard breaks existing implementations and should be avoided. |
@rockdaboot I collected your proposals here https://github.com/weppos/pslint/wiki/Rules That repo is a Linter prototype I'm working on. I realized that it will probably take a little extra work upfront to create a proper linter, but in the long run it's better rather than doing it as a test and then converting it into a linter in the next months. The linter will provide If you have more ideas for the rules, feel free to edit the wiki. Thanks again for the feedback! |
@weppos I can't edit the wiki :-) Not sure, are going to use Go or Ruby ? A few things needs to be made clear regarding PSL rules:
I guess, we are all waiting for DBOUND. |
@weppos It is pretty complete regarding your list of tests (on your wiki page). And please don't hesitate to comment the code - I am C and Java coder and this is my first self-written Python code (had to ask the web a lot this afternoon). |
The Ruby linter has been replaced with a linter binary written in Go.
@rockdaboot I apologize for the silence, I had a couple of intense weeks. I opened #145 to discuss the wildcard issues. Thanks for pointing it out. As for DBOUND, it's still enough far in the future to make this list reasonably relevant today. Moreover, the time currently required to manage the list justifies (to me) a little effort to get some tool to help with it. I appreciate your help with the Python script. My main concern with it is that it's not tested (indeed it's not a lack of trust in your coding skills 😉 ), which reduces my confidence in any tool especially when it has to deal with a variety of possible bad combinations. This is essentially the same issue I had while working on an initial linter embedded in the test suite itself. See Test the tests. That's why I decided to work on a separate tool, with the goal to have a general purpose, well tested linter we can reuse if needed. I must admit I'm not familiar with Python either, which makes me a little bit worried for the maintenance and for the result of any commit I could make on the script. Moreover, and this is a practical issue, if we should switch Travis to run the build with the Python container, we'll have to write the tests with a Python implementation of the list (as we can't have two different languages on the same container). I was able to make progress on the The current state is:
Dependencies:
Linter status: The linter currently covers these rules. Ideally, all "Proposed" rules will be converted into a check. I made the code as simple as possible to encourage contributions, the check definition is in a separate file. Each test method has a resonable test suite that covers all the various scenario. func TestLinter_CheckRuleLowercase_Cases(t *testing.T) {
linter := &Linter{}
var inputs []string
inputs = []string{
"",
"foo",
"// a comment",
".bad.suffix",
"// A comment", // ignore comments even if mixed case
}
for _, input := range inputs {
line := &line{source: input, number: 2}
problem, err := linter.checkRuleLowercase(line)
if err != nil {
t.Errorf("checkRuleLowercase('%v') returned error: %v", input, err)
}
if problem != nil {
t.Errorf("checkRuleLowercase('%v') should pass", input)
}
}
inputs = []string{
"mixedCase",
"mixed.Case",
"mixed.caSe",
}
for _, input := range inputs {
line := &line{source: input, number: 2}
problem, err := linter.checkRuleLowercase(line)
if err != nil {
t.Errorf("checkRuleLowercase('%v') returned error: %v", input, err)
}
if problem == nil {
t.Errorf("checkRuleLowercase('%v') should NOT pass", input)
}
}
} Long story short, the PR is ready to be merged. @gerv @sleevi what's your position here? |
@rockdaboot I forgot to mention I'd be more than happy to add you as a collab to the linter, if you feel you want to play with some Go. I really appreciate all your help and feedback on this PR and I'm sad we end up working in parallel to two different solutions. I'm also open if we want to move it elsewhere (but I'll leave the decision to @gerv, for now I just created it under my account as it is where I was most comfortable with given it was an independent idea to start working on it). |
@weppos The python linter was just an exercise (me learning python), it took me less than a day - and I basically spent this day on learning python. So nothing wasted. Don't be 'sad' - see it positively: a second solution always gives a benchmark. Also, some of the ideas I put on your wiki came when I was working on the linter implementation. Just some quick notes:
And last but not least... give yourself a kick and have a closer look into python. I completely disliked it after the first few glimpses. I already missed it... but on the second look I discovered a really helpful tool. And I looked at dozens of languages in the last 30 years - and until now only three convinced my: Assembler, C, and Java. Now there is Python (the first script language in my list)... I now time is always a limited resource, but IMHO it is worth spending some on Python. |
@weppos - Are there specific questions/thoughts for @gerv and I? I think I'd agree with @rockdaboot that downloading a pre-compiled binary is non-ideal; I thought we would just be able to use Travis w/ the Go compiler/runtime and run it as part of the lint step. With regards to the current Ruby setup, I'm a little nervous with the test file and testing against the real PSL (or perhaps I've misread). At least the Chrome approach to testing our PSL implementation has been to supply a 'dummy PSL' and then test known answers against the algorithm (to make sure we're implementing the algorithm correctly). Perhaps I've misread tests/test.txt My thought on the approach would be something like:
For example, tests/trailing_whitespace, tests/trailing_whitespace_golden, tests/double_wildcard, tests/double_wildcard_golden, tests/utf8, etc, where the _golden files contain just the expected stdout. I have no opinions on Ruby vs Python; personally, I find Python easier, but that's a taste thing, and I think if you're willing to do the work in Ruby, there's no reason I can't learn and adapt :) |
The problem is that in order to do that I have to switch to use the Travis Go container, which doesn't have the Ruby interpreter. Vice-versa, the current Ruby container doesn't have the Go binary. I'll be working on a Go public suffix library in the future in order to deal with some issues, but it won't be ready in the next days and I'd we can at least have a working implementation of the tests sooner rather than later.
The test file has been there for ages, so I just reused the conventions we had so far. I can test using a dummy file, but at that point I guess the test will essentially be testing the library that deals with the PSL, rather than the PSL itself. If I recall correctly, one of the goal to test against the real PSL was to make sure that the PSL was not corrupted. I guess this goal is accomplished by the linter now. At that point, we can probably totally remove the test files, but I'd still like to have a sort of real test cases. As for the Ruby thing, as I mention, it will eventually be replaced by a pure Go lib (I think), the thing is that I can't commit on a timing for that and I really wish we have something today, rather than keep trying to get the best solution that actually resulted (as of today) to not have anything.
Can you clarify which kind of tests should be expected at this stage?
If I understood correctly, this should be tested by the linter as they are potential formatting issues, not really rules acceptance issues. Am I correct? Also, as for the "Ruby" issue, I wonder if you guys actually checked what's inside this PR. It doesn't really require any further maintenance, as it's just loading the expectations from the existing test file (I converted the current tests/test_psl.txt into a file without named functions) and it's running it. As I said, I expect this to be replaced once I have a Go lib that will provide the same flexibility in terms of being able to load ab arbitrary list (at this time I use the PSL itself) to test against. |
@weppos I am sorry, but I don't understand what your Go implementation will have that libpsl doesn't provide. The library comes with a tool 'psl' that allows you to load arbitrary PS lists and do any checks controlled by command line options. Library and tool are packaged by Debian, Redhat, Arch (maybe more, libpsl belongs to the basic install of Debian). The linter needs test files (PSL files), good ones and bad ones to perform self-tests. They should reflect all those test cases that are listed in your wiki. I guess that's what @sleevi is talking about. Example bad test case 'test_group_order.psl':
Possible linter output:
|
Yes, I need a Go implementation for other uses. This is not specific to this case, I was just considering reusing it as a way to ditch the Ruby dependency (given the linter is in Go). As you probably understood, I'm not familiar with C. I know we can use all the languages of this world to solve this problem, but frankly speaking nobody took care of #11 in these years (plus all the various connected issues still open in Bugzilla), hence I went ahead and used tools and languages I'm familiar with.
I already have specs for each test that is included today. I will keep working on the linter according to my availability. Regarding this specific PR, my desire is to have a test suite that covers all the tests cases defined as of today (and possible further updates for e.g. based on the wildcard discussion), plus the linter. This PR does it, any other solutions we had so far don't. If any of you want to work on a different solution that covers all these cases and is willing to maintain it for the foreseeable future, I'll be happy to hand off the work. Otherwise, I suggest we should merge this as a starting point and evaluate further improvements later on. |
@weppos I would like to join forces. I am not against Go, I am just against reinventing the wheel. Go is able to call C library routines (cgo) - is this is an option for you ? (see http://stackoverflow.com/questions/1760468/interface-go-with-c-libraries). Regarding the python linter... we could move it into your repo, else I would like to keep it as part of libpsl source package and would add all those test files. Also I will maintain libpsl and the linter - see this as an opportunity to get more free time for other things. |
Have to amend my writing about the linter: As a script, I would like to see it being part of publicsuffix/list repo, including the test files. So easy use with Travis, the self-checks shouldn't take longer than a glimpse. Anybody forking the repo to generate a PR automatically has the linter available to test his/her changes before committing. |
FYI the liter I have been working on is both a bin and a lib. I am working I will continue to work on that in any case. I honestly don't feel the need As for the tests, I can look into loading the libpsl into go and use a My question is whether this is a blocking issue for this PR or an On Thursday, 18 February 2016, Tim Rühsen notifications@github.com wrote:
Simone Carletti https://simonecarletti.com/ |
@weppos Despite from any discussion about this PR I made some improvements regarding test PSL files to test (any) PSL linter with see here. There is pslint_selftest.sh, a shell script that calls the linter with each PSL file (.input) and compares it's output with the expected output (.expected). The input files should cover all the test cases from your wiki. Feel free to use anything from there for your own tests - the *.expected files have to be amended to match the output of your linter, of course. Most input files cover more than one test case. BTW, I took a look at Go syntax today... well it seems pretty close to Python (and even C). It should be pretty straight forward to convert code Go<->Python. I'll give it a try in the next days. |
@rockdaboot this discussion is sucking all my remaining energies, and I need something sooner rather than later. I don't want to continue a this-language or that-language discussion, and we are clearly doubling the work as of today. So let's move in this way. Open a PR with your proposal to replace this PR with:
If the PR is better than this one (and ideally I'd love to be able to make small maintenance tasks on it, but that's a plus), then I'll be happy to merge it and move forward. Otherwise, I will merge this one (remaining open to further improvements or replacements later on). I don't want to appear rude, but I've dozens of PRs waiting to be merged, and I need to adopt a solution that works. As I said, I also need a reusable linter and also a Go library for the PSL in any case, hence I'll continue working on it, but in that case I will be free to proceed without being constrained by the PSL. |
That was my intention. Maybe I wasn't clear enough.
We already have that right now on Travis CI. When building libpsl and performing 'make check', those test are made against the current PSL. Also a few more tests, like testing each entry in the PSL against the PSL with special care taken for exceptions and wildcards. Also the test.txt file will be linted (as a by-product).
That is PR #130. Right now, it replaces the libpsl test suite. But if you agree, I amend the PR to first run the linter and on success run the libpsl test suite. |
As I mention, I'd like the test to be pulled from a simpler text file (see the one in this PR) that reflects a list of possible input/output, rather the We should be able to add more entries to the file to simply append more tests.
Yes, the tests should not be replaced. Also, I need a confidence that the linter is properly tested. That was not in #130 as the linter was a script added with no corresponding tests. |
ACK. I don't like test_psl.txt format either. libpsl just strips the checkPublicSuffix away. I'll update libpsl test suite to accept both formats. Any naming preference ?
I'll amend this tomorrow.
The linter self test is included since today and executed, see .travis.yml (./pslint_selftest). |
@weppos It's all in PR #130 now. I added tools/convert_tests, a one-line shell script to convert test_psl.txt to tests.txt - just in case the sed command is needed again. In tests.txt the test cases are separated by a single space (' ') to make parsing as easy as possible. The Travis YML does linting + further checking via libpsl. Libpsl is prepared to take the new tests.txt format in it's test suite. I will prepare a new release as soon as you pull the PR into master. |
Closing in favor of #130 |
This PR adds a test suite as requested in #11 and also a linter that is supposed to check the most common errors, as proposed by @gerv in https://bugzilla.mozilla.org/show_bug.cgi?id=985003
The test suite is written in Ruby. I wanted to write it in another language (e.g. Go), but I then realized there is no way in Go to load an arbitrary list in the current
x/net/suffix
implementation.Instead, this is possible in my Ruby implementation which I plan to use to check the list when a PR is submitted.
I was a little afraid about the performance of a Ruby test suite, but I'm pleasantly surprised. Only the Linter runs in less than one second.
On Travis, this branch took around 12 seconds to be tested, which is still better than the existing C-based test suite that runs in around 40 seconds and doesn't test the list style.
My next step is to convert the tests in
test_psl.txt
into the new test suite. Actually, I'm planning to make these tests available in a sort of json format so that they can be used by other implementations to test an input and an expected output.I see for example that
Go
cloned the tests in the file into their test suite. Having them defined in a standard format (can even be just a test file with a tuple of <input, expected>) can probably be helpful./cc @gerv @sleevi
Here's some failure examples: