-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of long lines when missing-final-newline
is enabled
#5786
Conversation
The first pattern does not allow: However, do we really want to support |
Seems like we're using the second pattern ourselves in: I think it makes sense to disallow this, but this would probably need to be deprecated. That would also mean that the performance fix can only go in
|
@@ -129,6 +129,10 @@ Release date: TBA | |||
|
|||
Closes #5569 | |||
|
|||
* Fix parsing of long lines when ``missing-final-newline`` is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Fix parsing of long lines when ``missing-final-newline`` is enabled. | |
* Optimize parsing of long lines when ``missing-final-newline`` is enabled. |
@@ -180,6 +180,10 @@ Other Changes | |||
|
|||
Closes #5177, #5212 | |||
|
|||
* Fix parsing of long lines when ``missing-final-newline`` is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Fix parsing of long lines when ``missing-final-newline`` is enabled. | |
* Optimize parsing of long lines when ``missing-final-newline`` is enabled. |
\# # Beginning of comment | ||
.*? # Anything (as little as possible) | ||
\s*? # Any whitespaces (as little as possible) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, because every change is a breaking change (heavy sigh), but it's probably worth it.
We could do:
\s*? # Any whitespaces (as little as possible) | |
.*? # Anything (as little as possible) |
Until we release 3.0, but then we need a way to make the deprecation known. So should we warn the user if we match a pylint comment with .*?
but not with \s*?
? it would actually decrease performance in 2.X, possibly by a lot [citation required].
Another solution is to consider that it was an unintended feature and just consider this a performance fix. How many users are going to be pissed off by this decision... ? I wish I knew. We already know that the user affected by the catastrophic backtracking aren't happy 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a real performance change if we make .*?
a capturing group and check if it contains anything other than spaces whenever we use this regex pattern. That should be relatively straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it solve the breaking change issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The breaking change would be to remove .*
completely and replace it with nothing or with \s*
. To deprecate this we can use a capturing group and emit a warning when we match anything other than spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would we be doing the breaking change if there is no performance issues if we keep the current behavior ?
I would expect, that the backtracking issue might be solved without incompatible changes. |
Please feel free to see if you can find a pattern that satisfies the current tests. I thought it should be quite easy as well, but stumbled upon some roadblocks.. |
This works for me: --- pragma_parser.py.old 2022-02-11 15:35:07.864685538 +0300
+++ pragma_parser.py 2022-02-11 15:35:14.156481194 +0300
@@ -9,11 +9,8 @@
# so that an option can be continued with the reasons
# why it is active or disabled.
OPTION_RGX = r"""
- \s* # Any number of whithespace
- \#? # One or zero hash
- .* # Anything (as much as possible)
- (\s* # Beginning of first matched group and any number of whitespaces
- \# # Beginning of comment
+ (?:\s*|\s*\#.*(?=\#.*?\bpylint:))
+ (\# # Beginning of comment
.*? # Anything (as little as possible)
\bpylint: # pylint word and column
\s* # Any number of whitespaces |
Have you tested this against |
Hmm, I see. Probably, we should have another alternative to match comments, i.e. |
diff --git a/pylint/utils/pragma_parser.py b/pylint/utils/pragma_parser.py
index 5ef4ef481..482e295ef 100644
--- a/pylint/utils/pragma_parser.py
+++ b/pylint/utils/pragma_parser.py
@@ -12,8 +12,8 @@ OPTION_RGX = r"""
- \s* # Any number of whitespace
- \#? # One or zero hash
- .* # Anything (as much as possible)
- (\s* # Beginning of first matched group and any number of whitespaces
- \# # Beginning of comment
+ (?:^\s*\#.*|\s*|\s*\#.*(?=\#.*?\bpylint:))
+ (\# # Beginning of comment
.*? # Anything (as little as possible)
\bpylint: # pylint word and column
\s* # Any number of whitespaces Nice job! I think that actually works. If you want you can create a PR yourself @skirpichev, you are the one that came up with it! |
No, I think you did almost everything. BTW, probably different conditions in the |
We're using.. nothing I guess. We have a global timeout for the tests in CI, and a benchmark using pytest-benchmark that we do not look at (so it's basically a time waster in our CI right now, that we hope to use later). Is pytest-timeout is something that would solve this issue ? |
Yes, that's how I address such issues (local timeouts for tests). It's a little tricky, because you need to find out a timeout, that triggers problem on different machines (e.g. locally for you and on the Github Actions CI). |
We have a similar system in place, but don't use |
Superseded by #5925, thanks to @skirpichev 😄 |
…ine' (#5925) * Fix parsing of long lines when ``missing-final-newline`` is enabled * Adapt fa31b6b to be backward-compatible Fixes #5724 Also address comments from the PR #5786 Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com> Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>
doc/whatsnew/<current release.rst>
.Type of Changes
Description
Closes #5724.
The stuff at the beginning of this regex pattern isn't actually needed for us to extract our disable comments. This massively speeds up the parsing and unstucks us from the code in the issue.
I haven't added a test case as it is a bit difficult to test "don't look for things we don't need"...