Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of long lines when missing-final-newline is enabled #5786

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ Release date: TBA

Closes #5569

* Fix parsing of long lines when ``missing-final-newline`` is enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Fix parsing of long lines when ``missing-final-newline`` is enabled.
* Optimize parsing of long lines when ``missing-final-newline`` is enabled.


Closes #5724

* Fix false positives for ``used-before-assignment`` from using named
expressions in a ternary operator test and using that expression as
a call argument.
Expand Down
4 changes: 4 additions & 0 deletions doc/whatsnew/2.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,10 @@ Other Changes

Closes #5177, #5212

* Fix parsing of long lines when ``missing-final-newline`` is enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Fix parsing of long lines when ``missing-final-newline`` is enabled.
* Optimize parsing of long lines when ``missing-final-newline`` is enabled.


Closes #5724

* Fix ``unnecessary_dict_index_lookup`` false positive when deleting a dictionary's entry.

Closes #4716
Expand Down
7 changes: 2 additions & 5 deletions pylint/utils/pragma_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,9 @@
# so that an option can be continued with the reasons
# why it is active or disabled.
OPTION_RGX = r"""
\s* # Any number of whitespace
\#? # One or zero hash
.* # Anything (as much as possible)
(\s* # Beginning of first matched group and any number of whitespaces
( # Beginning of first matched group and any number of whitespaces
\# # Beginning of comment
.*? # Anything (as little as possible)
\s*? # Any whitespaces (as little as possible)
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, because every change is a breaking change (heavy sigh), but it's probably worth it.

We could do:

Suggested change
\s*? # Any whitespaces (as little as possible)
.*? # Anything (as little as possible)

Until we release 3.0, but then we need a way to make the deprecation known. So should we warn the user if we match a pylint comment with .*? but not with \s*? ? it would actually decrease performance in 2.X, possibly by a lot [citation required].

Another solution is to consider that it was an unintended feature and just consider this a performance fix. How many users are going to be pissed off by this decision... ? I wish I knew. We already know that the user affected by the catastrophic backtracking aren't happy 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a real performance change if we make .*? a capturing group and check if it contains anything other than spaces whenever we use this regex pattern. That should be relatively straightforward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it solve the breaking change issue ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The breaking change would be to remove .* completely and replace it with nothing or with \s*. To deprecate this we can use a capturing group and emit a warning when we match anything other than spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why would we be doing the breaking change if there is no performance issues if we keep the current behavior ?

\bpylint: # pylint word and column
\s* # Any number of whitespaces
([^;#\n]+)) # Anything except semicolon or hash or newline (it is the second matched group)
Expand Down