-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better primer tests (False positives detection) #5364
Comments
* Add changelog and warning about unstable API in testutil * Add primer tests, (running pylint on external libs during tests) In order to anticipate crash/fatal messages, false positives are harder to anticipate. Follow-up will be #5359 and later on #5364 Add '__tracebackhide__ = True' so the traceback is manageable Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com>
A data point. Since I'm back for round 2 on
Still, it was easy to glance by eye whether they were true or false positives (easier than debugging the root cause in pylint), so even some artifact printed out during the CI job would be nifty. |
Likely doing #5403 first would help. |
@DanielNoord this might be the place to continue the discussion about false negative/positive changes in primer repos?
What if we committed the baseline json to the repo? So we only lint a project once and compare it to the baseline on disk. Or am I missing why it would need to be run twice? |
One thing I considered is uploading the primer json output as an artefact on However, I noticed that the
Of course, we could definitely explore this, but I feel like they would do it too if was actually feasible. |
Artifacts might be difficult to deal with since you would probably need to know the actions run that created them. What might work however is using the cache for that. With a good cache key, we should be able to store almost all results we need. The fallback can still be to just run it twice. |
Yeah, I mean it's definitely something we should explore but perhaps in V3? V4? (😄). You know what, I'm going to ask the guys over at |
If I had to guess, nobody has had the idea and/or time to implement it yet. |
Don't know. Isn't a simple line diff also possible. I believe that's what mypy_primer does. With that, no additional formatting would be necessary. -- |
Yeah I was thinking about that, but then started to think about what would happen if: The diff would include the I foresaw some headaches getting that to look pretty while a parseable JSON makes lives just that much easier (and is requested by other projects and users as well). Felt like it would be better to catch two birds with one stone here rather than spend some time working with the less parseable diff output. |
I would just strip out every line that starts with
|
To get back on the response from I might have some time to look at this in the near future. Although I do think that fixing the |
I've got a first proof of concept over at: DanielNoord#145 That PR creates an intentional false positive for It does cache the projects that we run over, although very roughly. Also it gets the |
It looks pretty dandy ❤️ ! Wondering if showing the code around the new warning (maybe ~= 10 lines, 5 before, 5 after) is reasonable. It would facilitate the review for sure by allowing to understand what's happening in most case without opening the analyzed code in an external IDE. |
With #6723 this is now merged on |
@jacobtylerwalls @Pierre-Sassoulas See #6636 (comment). Do we want to exclude messages were only the actual message changes? Or do we want to keep warning about those? |
I think we can show the diff of the message in that case. We know there's a change, but we can probably benefit from comparing the message generated. Ie: Something like: The following message content changed:
- *Consider merging these comparisons with "in" to 't in (token.LPAR, token.RPAR)'*
+ *Consider merging these comparisons with "in" by using "t in (token.LPAR, token.RPAR, )". Use a set instead if elements are hashable."* https://github.com/psf/black.git/blob/main/src/black/nodes.py#L313 |
See: We don't actually fail on a crash. We should probably do so. |
If we fail on a crash but its bleeding edge astroid, I don't know... Do we also have a check for astroid stable ? I don't want a situation where pylint pipeline is unstable because it depends on astroid |
I just added an Although they do show up, I don't like how they can be easily missed. I'll try to think of something to make them more visible. Perhaps a new |
💥 ( |
I'm assigning myself to the @jacobtylerwalls I just thought of another issue which we should probably solve:
I think we should probably do something about that.. |
If we truly end up with a "race condition" like that I'm happy to explain it to a pylint contributor. The pipeline wouldn't be failing. And they would probably complain "but my changes couldn't have done THAT!" so I doubt they would spin too many wheels. If they do want to pop the hood it's never too early to start learning about Anyway, as soon as something else is pushed to main, there would be a new run and no longer any drift the next time someone pushes to the PR. You can construct a similar false negative problem. Until I also think we shouldn't put outsized attention on fatal errors and crashes; astroid changes can involve a lot more than just crashes! :-) |
I guess I'm saying I much prefer one commit of drift across astroid during a one-commit interval on a PR to a four-month drift between CI and what we already know we will have to depend on for four months of PRs! |
Hmm although I agree that this probably won't happen often and can be easily explained I do think we can improve the situation somewhat. What about running a cron-timed Primer job every morning at 12am and let it comment/ping us whenever it finds an issue with bleeding edge I'm just thinking of the previous issues we had with the CI. Although these should be resolved after a single commit to |
Yeah this is what I'm trying to avoid too. Maybe if bleeding edge fail we could launch on the normal astroid. Something like: |
Right, this is sort of what I'm worried about. I don't want to focus too much on "fighting the last war" -- if we're talking about changes in astroid that cause failing pipelines in pylint, not merely vanilla behavior changes, we should solve it directly: run a subset of pylint tests in
This is potentially even scarier than the status quo for a first-timer, no?
Stable astroid might not be stable, of course. We might have multiple contributors waiting a week for a patch release instead of a "race condition" for a few hours.
I would push back on this -- we're talking about people who happen to push once and then stop pushing completely during a very small window of time. I won't veto anything, I just am not seeing the upside of working on this. If we really wanted to do it, my alternative proposal would be to have the compare job compare the astroid SHA and then relaunch the |
I can't remember the last time pylint didn't have a commit every morning, so the cron job would probably need to be every four hours! :D |
Actually, the cron job is a good idea for the old primer, i.e. the one that doesn't compare messages, but just looks for crashes. If we had a daily job that ran the old primer on bleeding edge astroid and exited with a failing code on fatal messages it wouldn't block/interact with any PRs, and the failing jobs would be useful information. |
OTOH, a more efficient way to get to the same result is to just run the "old pylint primer" on every commit to main in astroid. |
Or just run the current primer with #6746. Let's tweak the current on a bit more and then decide what to do. We could (for example) extract it into it's own repository and package with associated shareable Github Actions actions. |
Some issue analysis in #6769 (comment). Might be relevant for later discussion. It shows why we need to run The run that I have just restarted in https://github.com/PyCQA/pylint/actions/runs/2414623971 shows that the system is still correctly taking new runs after they have completed. With the longer run time it just means that quick successive PRs can use different |
Some weird things are going on with the runs in: We have been seeing it on other PRs as well. For some reason the |
Maybe just replace pandas and sentry with other packages in the meantime? |
It seems to be the first push to a PR after a new push to main passes. And then the next push to a PR fails again. |
Yeah, although I'm not sure which packages wouldn't create this issue. I can't really test that locally 😅 |
I'm considering this closed. We have the primer comment and some detection. Any remaining issues can be tracked in #5359. |
Current problem
Right now the primer tests are only checking for fatal messages and crashes. But anticipating false positive would be a huge plus in term of quality of releases.
Desired solution
A solution to anticipate the false positives by running pylint on external repositories.
Additional context
Originally posted by @DanielNoord (#5173 (comment)):
Some points I found while working on this:
The initial approach of asserting
ex.code == 0
doesn't work since many packages will return error messages. Even thestdlib
packages. We should therefore only look for crashes (ex.code == 32
) or fatal messages (ex.code == 1
) I think.I have created a new CI job to do the primer tests on Linux on every push or pull requests. We don't want to run them only when bumping a version (as we did with the previous acceptance tests), but I think it is good to separate them from the other tests. Especially since they will probably take much longer to complete (more on that later) and sometimes early fails on the tests are helpful in finalising a PR (especially after GitHub review comments break a test).
The lazy loading of repo's checks for the SHA1 hashes of the local commit and remote commit and reclones when it finds a difference.
tests/primer/primer_external_packages.py
is used to store a list of packages to download. We currently includenumpy
but I would argue against including it.numpy
has its tests included in the sources files. Normally we don't want to runpylint
over tests as this can create problems when tests use frameworks that use non-normal python code. Besides, running over the tests also really inflates the time it takes to run the primer tests. I would think we could come up with 2/3 other projects that might be better. Note that these projects do not need to usepylint
, we just need to be able to assume that their code is "normal" and therefore shouldn't crashpylint
orastroid
.We might also want to improve the message that gets raised when the test fails. Currently it is not immediately clear where
pylint
crashed. Improving the message might help expedite the process of fixing it.Perhaps we should add
--only-fatal
? To make the output only print fatal errors?Originally posted by @cdce8p (#5173 (comment)):
Maybe the --capture=tee-sys option works?
https://pytest.org/en/6.2.x/capture.html#setting-capturing-methods-or-disabling-capturing
Originally posted by @DanielNoord (#5173 (comment)):
I'm not sure this is the case. Take
undefined-variable
for example:https://github.com/PyCQA/pylint/blob/e5082d3f9401dbcf65b40ce6a819d2a09beccb5c/pylint/checkers/variables.py#L393-L397
We know there are issues with this message. There are false positives and negatives and due to the lack of control-flow we are not able to solve this (now).
If we make it so the primer tests fail whenever an
Error
-level message is found in a project there cannot be anyundefined-variable
messages. This is (currently) unfeasible. There are many moreError
-level messages where we know there are issues, which we cannot always fix so easily.We can (sort of) avoid this by only including projects that use
pylint
as they are likely to have already disabled current false positives, but that doesn't fully help us.What if a commit changes the way
undefined-variable
behaves and the project emits 10 new warnings. We would need to investigate whether any of these are correct or false positive. For larger projects this number might be much higher. Even if we identify 1 false positive and fix it, the primer will still fail because of 9undefined-variable
messages. We know these messages are correct (the project just hasn't updated to the WIP-pylint version) and can merge the commit, but from now on every primer CI job will fail because of those 9 messages.By only checking for
Fatal
andCrash
we make sure thatpylint
can parse any type of code (pylint
-enforced and non-enforced) without breaking/crashing.For reference the only
Fatal
messages are:method-check-failed
,fatal
,astroid-error
andparse-error
.So you would want to allow the use of
--use-all-extensions
anddisable-plugin
at the same time?The text was updated successfully, but these errors were encountered: