-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per directory configs - preliminary changes #9550
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9550 +/- ##
==========================================
- Coverage 95.81% 95.79% -0.03%
==========================================
Files 173 173
Lines 18825 18851 +26
==========================================
+ Hits 18038 18058 +20
- Misses 787 793 +6
|
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This makes it much easier to review. Left some comments!
requirements_test.txt
Outdated
@@ -9,3 +9,4 @@ six | |||
# Type packages for mypy | |||
types-pkg_resources==0.1.3 | |||
tox>=3 | |||
pre-commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave this out? pre-commit
is not necessary to run the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll move it out of this branch.
But do you have any advice - how can I get rid of formatting: failed with pre-commit is not allowed, use allowlist_externals to allow it
without adding this in requirements? Is there some global config for tox where I can add pre-commit dependency or set allowlist_externals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really use tox
, didn't know we still recommend it.
You can probably just use the CI for this? That's what I always do 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved pre-commit
dependency directly to the tox.ini config - does it seem like better solution?
if Path(".").resolve() not in linter._directory_namespaces: | ||
linter._directory_namespaces[Path(".").resolve()] = (linter.config, {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not have the if
I think? _config_initialization
should be called once? Even for multi-dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_config_initialization
has some non-trivial logic for parsing several possible variants of configs into namespace, merging it with command-line arguments, configuring plugins and reporting errors during this process. So it was convenient to reuse all this logic for parsing additional configs, and _config_initialization
is going to be called for each new config in subsequent changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but shouldn't we pass the path of the current config file to this function then? On its own this if statement doesn't make a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly - the path of the current config file is passed to _config_initialization
in Run.__init__
, and paths of new config files will be passed there in register_local_config
.
The idea of this if
is that first time when _config_initialization
is called, it parses config from working directory, and this config should be saved to linter._directory_namespaces
to avoid additional processing of special cases. But next times when _config_initialization
is called, it shouldn't overwrite config for working directory with values from new files.
@@ -66,7 +66,7 @@ | |||
ModuleDescriptionDict, | |||
Options, | |||
) | |||
from pylint.utils import ASTWalker, FileState, LinterStats, utils | |||
from pylint.utils import ASTWalker, FileState, LinterStats, merge_stats, utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why we need the merging of stats for the multi-dir config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current state is that some stat counters are reset during linter.open()
, some are reset in linter.set_current_module
-> init_single_module()
, and some are not reset at all (error, warning etc in 1st group, all counters in stats.by_module in 2nd group, statement in 3rd). It leads to incorrect score calculation when linter (i.e. main checker) is opened per file, or when it is opened after getting asts.
So I decided to reset all possible counters for each new module by creating new LinterStats object in set_current_module
.
If stats reset is omitted entirely, then another problem arises:
When jobs>1, the same linter object can be used for checking several modules, stats after each module are copied and then merged.
It leads to a situation when some stats are accounted several times in final result (it's checked in test_subconfigs_score
in my 1st PR).
Explicit stats reset and merge in single process can be avoided, but it will require additional changes in code for parallel checks. I'd suggest to leave it as a possible optimization in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you probably already explain it but I don't fully understand. This explanation seems to point to a general issue with stats merging, not something that has to do with multi-directory configs. Or am I misunderstanding you? If it is just a general issue we should tackle it in a separate PR.
Hey @0nf and @DanielNoord, I've been running the benches on this branch, and these changes seem to significantly impact the baseline benchmarks and from their definition, those seem important: pylint/tests/benchmark/test_baseline_benchmarks.py Lines 118 to 124 in 67bfab4
However, there is a big regression on the runs: Curious to know if you expected this performance change. |
Thanks for that @art049. It is probably related to the moving of |
@DanielNoord yes it seems from the differential profile the regressions is mainly located in A lot of new code(in blue) is executed here. |
There is also a report based on branch with full changes for per-directory configs. |
- Add opportunity to open checkers per-file, so they can use values from local config during opening - Save command line arguments to apply them on top of each new config - More accurate verbose messages about config files - Enable finding config files in arbitrary directories - Add opportunity to call linter._astroid_module_checker per file in single-process mode - Collect stats from several calls of linter._astroid_module_checker in single-process mode - Extend linter._get_namespace_for_file to return the path from which namespace was created
42bbb5a
to
2f6c087
Compare
This comment has been minimized.
This comment has been minimized.
- Responses to review comments - Add test for calling _astroid_module_checker on different levels - Move Path.resolve() out of _get_namespace_for_file recursive calls
2f6c087
to
9b3576f
Compare
@jacobtylerwalls I think you have done some regression testing in the past. Can you comment on whether you see a performance regression with these changes/ |
🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉 This comment was generated for commit 9b3576f |
config_path, namespace = self._get_namespace_for_file( | ||
Path(filepath).resolve(), self._directory_namespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked report traces the regression to the call to resolve()
.
I wonder if we can guard it under not path.is_absolute()
:
>>> from timeit import timeit
>>> timeit('p.is_absolute()', setup='from pathlib import Path; p=Path(".")')
0.1643185840221122
>>> timeit('p.resolve()', setup='from pathlib import Path; p=Path(".")')
10.929103292000946
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Seems like is_absolute() will always be false as things stand, so we probably need to look higher up the stack for a place to do some sort of conversion.
Hi @DanielNoord ! I haven't marked some of your review notes as resolved because I don't know if my answers to them were sufficient. Could you check if my comments actually answer your questions? 🙂 |
Also, I'm a bit confused about what to do with performance drop in test_baseline_benchmark_j1.
|
Just letting you know that this is on my TODO list but just haven't found the time yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for continuing with this @0nf
If you don't mind I could also split off some of the things I think we can easily merge into separate PRs to get them reviewed by other maintainers to make this PR a little bit more manageable.
if len(linter._directory_namespaces) == 0: | ||
linter._directory_namespaces[Path(".").resolve()] = (linter.config, {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I'd prefer to revert the changes in these two lines.
They are really rightly coupled to the final implementation of the per directory configs and the performance impact is hard to judge on its own. As far as I can see, all other changes in this PR are somewhat sensible on their own. This one isn't.
"""Iterate over the default config file names and see if they exist.""" | ||
basedir = Path(basedir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basedir = Path(basedir) |
That should not be needed.
@@ -66,7 +66,7 @@ | |||
ModuleDescriptionDict, | |||
Options, | |||
) | |||
from pylint.utils import ASTWalker, FileState, LinterStats, utils | |||
from pylint.utils import ASTWalker, FileState, LinterStats, merge_stats, utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you probably already explain it but I don't fully understand. This explanation seems to point to a general issue with stats merging, not something that has to do with multi-directory configs. Or am I misunderstanding you? If it is just a general issue we should tackle it in a separate PR.
with augmented_sys_path(extra_packages_paths): | ||
# 2) Get the AST for each FileItem | ||
ast_per_fileitem = self._get_asts(fileitems, data) | ||
# 3) Lint each ast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be down below
with augmented_sys_path(extra_packages_paths): | ||
# 2) Get the AST for each FileItem | ||
ast_per_fileitem = self._get_asts(fileitems, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have unintended effects from not getting these within the context manager..
|
||
def _lint_file( | ||
self, | ||
file: FileItem, | ||
module: nodes.Module, | ||
check_astroid_module: Callable[[nodes.Module], bool | None], | ||
check_astroid_module: Callable[[nodes.Module], bool | None] | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is really touching some of the core behaviour of this behemoth of a class so it is a bit hard to review. Sorry in advance.
Why is this now optional? I don't really like that design as it further complicates this function body. Could you explain why this is needed? And could that perhaps be a separate PR?
Type of Changes
Description
Modifications in existing code base that are needed for per-directory configs. This PR does not introduce new functionality itself, but contains part of the changes from #9395.
The only new behavior from this PR is slightly modified messages in verbose mode.
Refs #618