-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track top-level module imports in the semantic model #9775
Conversation
76203ea
to
508aec6
Compare
CodSpeed Performance ReportMerging #9775 will improve performances by 14.55%Comparing Summary
Benchmarks breakdown
|
2c94285
to
747f8e0
Compare
I suspect the gains here are actually a little under-estimated, because most of our benchmark files import NumPy, and files that don't import NumPy should get a huge boost from this change. |
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
S101 | 117 | 0 | 117 | 0 | 0 |
ANN202 | 70 | 0 | 70 | 0 | 0 |
FA100 | 55 | 0 | 55 | 0 | 0 |
AIR001 | 40 | 0 | 40 | 0 | 0 |
ANN201 | 38 | 0 | 38 | 0 | 0 |
ANN001 | 28 | 0 | 28 | 0 | 0 |
D103 | 24 | 0 | 24 | 0 | 0 |
T201 | 23 | 0 | 23 | 0 | 0 |
PLC0415 | 18 | 0 | 18 | 0 | 0 |
COM812 | 14 | 0 | 14 | 0 | 0 |
SIM117 | 12 | 0 | 12 | 0 | 0 |
FBT001 | 6 | 0 | 6 | 0 | 0 |
FBT002 | 5 | 0 | 5 | 0 | 0 |
PT012 | 5 | 0 | 5 | 0 | 0 |
PTH118 | 5 | 0 | 5 | 0 | 0 |
PTH123 | 4 | 0 | 4 | 0 | 0 |
FIX002 | 4 | 0 | 4 | 0 | 0 |
TD002 | 4 | 0 | 4 | 0 | 0 |
TD003 | 4 | 0 | 4 | 0 | 0 |
D106 | 4 | 0 | 4 | 0 | 0 |
RET505 | 3 | 0 | 3 | 0 | 0 |
D101 | 3 | 0 | 3 | 0 | 0 |
D205 | 3 | 0 | 3 | 0 | 0 |
ANN401 | 3 | 0 | 3 | 0 | 0 |
B015 | 2 | 0 | 2 | 0 | 0 |
CPY001 | 2 | 0 | 2 | 0 | 0 |
D202 | 2 | 0 | 2 | 0 | 0 |
D212 | 2 | 0 | 2 | 0 | 0 |
PLW1514 | 2 | 0 | 2 | 0 | 0 |
D107 | 2 | 0 | 2 | 0 | 0 |
D400 | 2 | 0 | 2 | 0 | 0 |
D415 | 2 | 0 | 2 | 0 | 0 |
C408 | 2 | 0 | 2 | 0 | 0 |
PERF401 | 1 | 0 | 1 | 0 | 0 |
A002 | 1 | 0 | 1 | 0 | 0 |
A001 | 1 | 0 | 1 | 0 | 0 |
D100 | 1 | 0 | 1 | 0 | 0 |
D401 | 1 | 0 | 1 | 0 | 0 |
D404 | 1 | 0 | 1 | 0 | 0 |
PTH100 | 1 | 0 | 1 | 0 | 0 |
PTH120 | 1 | 0 | 1 | 0 | 0 |
D209 | 1 | 0 | 1 | 0 | 0 |
PLR0913 | 1 | 0 | 1 | 0 | 0 |
PLR0917 | 1 | 0 | 1 | 0 | 0 |
C901 | 1 | 0 | 1 | 0 | 0 |
PLR0915 | 1 | 0 | 1 | 0 | 0 |
PLR1702 | 1 | 0 | 1 | 0 | 0 |
PLR6201 | 1 | 0 | 1 | 0 | 0 |
PLR2004 | 1 | 0 | 1 | 0 | 0 |
TD004 | 1 | 0 | 1 | 0 | 0 |
RET504 | 1 | 0 | 1 | 0 | 0 |
PLR0914 | 1 | 0 | 1 | 0 | 0 |
ARG001 | 1 | 0 | 1 | 0 | 0 |
The ecosystem changes are because I gated some of the Pandas rules to require a Pandas import. I can roll that back. It honestly might be an improvement... It's trading false positives for false negatives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very Clever and nice perf improvement, especially for projects that have the frameworks specific rules enabled without using the framework.
I struggle a bit with the "see" and "seen" terminology. It's very generic and to me its unclear what "semantic.see(module)" would mean.
Should we also consider' builtins'? For if pandas is globally imported, run the panda rules.
I fear that this otherwise breaks some workflows. For example, I recently recommended the use of builtins to cover the use case where they use %run other_module.py
where other_module.py
imports pandas globally.
if str::is_lowercase(name) { | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the entire performance win is because of this unrelated lint rule change 😄
The ecosystem changes all look like previous false positives. +1 from me. |
747f8e0
to
e09bf0a
Compare
@zanieb - I decided to roll back that specific change. We can definitely consider it but it felt wrong to include in this optimization PR. (Some (not all) of the changes were false negatives.) |
Summary
This is a simple idea to avoid unnecessary work in the linter, especially for rules that run on all name and/or all attribute nodes. Imagine a rule like the NumPy deprecation check. If the user never imported
numpy
, we should be able to skip that rule entirely -- whereas today, we do aresolve_call_path
check on every name in the file. It turns out that there's basically a finite set of modules that we care about, so we now track imports on those modules as explicit flags on the semantic model. In rules that can only ever trigger if those modules were imported, we add a dedicated and extremely cheap check to the top of the rule.We could consider generalizing this to all modules, but I would expect that not to be much faster than
resolve_call_path
, which is just a hash map lookup onTextSize
anyway.It would also be nice to make this declarative, such that rules could declare the modules they care about, the analyzers could call the rules as appropriate. But, I don't think such a design should block merging this.