Per directory configs - preliminary changes #9550

0nf · 2024-04-15T14:50:58Z

Type of Changes

	Type
✓	🔨 Refactoring

Description

Modifications in existing code base that are needed for per-directory configs. This PR does not introduce new functionality itself, but contains part of the changes from #9395.

The only new behavior from this PR is slightly modified messages in verbose mode.

Refs #618

codecov · 2024-04-16T12:10:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.79%. Comparing base (67bfab4) to head (9b3576f).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9550      +/-   ##
==========================================
- Coverage   95.81%   95.79%   -0.03%     
==========================================
  Files         173      173              
  Lines       18825    18851      +26     
==========================================
+ Hits        18038    18058      +20     
- Misses        787      793       +6

Files	Coverage Δ
pylint/config/arguments_manager.py	`99.46% <100.00%> (+0.01%)`	⬆️
pylint/config/config_file_parser.py	`100.00% <100.00%> (ø)`
pylint/config/config_initialization.py	`98.91% <100.00%> (+0.02%)`	⬆️
pylint/config/find_default_config_files.py	`91.48% <100.00%> (+0.18%)`	⬆️
pylint/lint/pylinter.py	`96.56% <100.00%> (+0.12%)`	⬆️

... and 1 file with indirect coverage changes

DanielNoord

Thanks! This makes it much easier to review. Left some comments!

DanielNoord · 2024-04-17T07:15:12Z

requirements_test.txt

@@ -9,3 +9,4 @@ six
 # Type packages for mypy
 types-pkg_resources==0.1.3
 tox>=3
+pre-commit


Can we leave this out? pre-commit is not necessary to run the tests.

Ok, I'll move it out of this branch.
But do you have any advice - how can I get rid of formatting: failed with pre-commit is not allowed, use allowlist_externals to allow it without adding this in requirements? Is there some global config for tox where I can add pre-commit dependency or set allowlist_externals?

I don't really use tox, didn't know we still recommend it.

You can probably just use the CI for this? That's what I always do 😄

Moved pre-commit dependency directly to the tox.ini config - does it seem like better solution?

pylint/config/config_initialization.py

DanielNoord · 2024-04-17T07:16:35Z

pylint/config/config_initialization.py

+    if Path(".").resolve() not in linter._directory_namespaces:
+        linter._directory_namespaces[Path(".").resolve()] = (linter.config, {})


This should not have the if I think? _config_initialization should be called once? Even for multi-dir?

_config_initialization has some non-trivial logic for parsing several possible variants of configs into namespace, merging it with command-line arguments, configuring plugins and reporting errors during this process. So it was convenient to reuse all this logic for parsing additional configs, and _config_initialization is going to be called for each new config in subsequent changes

Hmm but shouldn't we pass the path of the current config file to this function then? On its own this if statement doesn't make a lot of sense.

Exactly - the path of the current config file is passed to _config_initialization in Run.__init__, and paths of new config files will be passed there in register_local_config.

The idea of this if is that first time when _config_initialization is called, it parses config from working directory, and this config should be saved to linter._directory_namespaces to avoid additional processing of special cases. But next times when _config_initialization is called, it shouldn't overwrite config for working directory with values from new files.

pylint/config/find_default_config_files.py

DanielNoord · 2024-04-17T07:18:14Z

pylint/lint/pylinter.py

@@ -66,7 +66,7 @@
    ModuleDescriptionDict,
    Options,
 )
-from pylint.utils import ASTWalker, FileState, LinterStats, utils
+from pylint.utils import ASTWalker, FileState, LinterStats, merge_stats, utils


Could you explain why we need the merging of stats for the multi-dir config option?

Current state is that some stat counters are reset during linter.open(), some are reset in linter.set_current_module -> init_single_module(), and some are not reset at all (error, warning etc in 1st group, all counters in stats.by_module in 2nd group, statement in 3rd). It leads to incorrect score calculation when linter (i.e. main checker) is opened per file, or when it is opened after getting asts.
So I decided to reset all possible counters for each new module by creating new LinterStats object in set_current_module.

If stats reset is omitted entirely, then another problem arises:
When jobs>1, the same linter object can be used for checking several modules, stats after each module are copied and then merged.
It leads to a situation when some stats are accounted several times in final result (it's checked in test_subconfigs_score in my 1st PR).

Explicit stats reset and merge in single process can be avoided, but it will require additional changes in code for parallel checks. I'd suggest to leave it as a possible optimization in another PR.

Sorry, you probably already explain it but I don't fully understand. This explanation seems to point to a general issue with stats merging, not something that has to do with multi-directory configs. Or am I misunderstanding you? If it is just a general issue we should tackle it in a separate PR.

art049 · 2024-04-17T21:14:05Z

Hey @0nf and @DanielNoord, I've been running the benches on this branch, and these changes seem to significantly impact the baseline benchmarks and from their definition, those seem important:

pylint/tests/benchmark/test_baseline_benchmarks.py

Lines 118 to 124 in 67bfab4

    
               def test_baseline_benchmark_j1(self, benchmark: BenchmarkFixture) -> None: 
        
                   """Establish a baseline of pylint performance with no work. 
        
                   We will add extra Checkers in other benchmarks. 
        
                   Because this is so simple, if this regresses something very serious has happened 
        
                   """

However, there is a big regression on the runs:

Curious to know if you expected this performance change.
For some explanation, I installed CodSpeed on a fork synced with this repo. You can look at the full report here.

DanielNoord · 2024-04-18T07:03:04Z

Thanks for that @art049.

It is probably related to the moving of ast_per_fileitem = self._get_asts(fileitems, data). Does the report also point to what is making the performance slower?

art049 · 2024-04-18T19:17:26Z

@DanielNoord yes it seems from the differential profile the regressions is mainly located in PyLinter._get_namespace_for_file:

A lot of new code(in blue) is executed here.

0nf · 2024-04-19T09:58:44Z

Path.resolve() is not connected to the moving of self._get_asts(fileitems, data), it is just added for correct identification of parent directories, including situations where paths contain symlinks.

There is also a report based on branch with full changes for per-directory configs. _get_namespace_for_file is behind the new feature flag there, so in the end performance of test_baseline_benchmark_j1 is affected to a much lesser extent.

- Add opportunity to open checkers per-file, so they can use values from local config during opening - Save command line arguments to apply them on top of each new config - More accurate verbose messages about config files - Enable finding config files in arbitrary directories - Add opportunity to call linter._astroid_module_checker per file in single-process mode - Collect stats from several calls of linter._astroid_module_checker in single-process mode - Extend linter._get_namespace_for_file to return the path from which namespace was created

- Responses to review comments - Add test for calling _astroid_module_checker on different levels - Move Path.resolve() out of _get_namespace_for_file recursive calls

DanielNoord · 2024-04-24T07:43:16Z

@jacobtylerwalls I think you have done some regression testing in the past. Can you comment on whether you see a performance regression with these changes/

github-actions · 2024-04-24T07:56:44Z

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 9b3576f

jacobtylerwalls · 2024-05-01T12:14:11Z

pylint/lint/pylinter.py

+            config_path, namespace = self._get_namespace_for_file(
+                Path(filepath).resolve(), self._directory_namespaces


The linked report traces the regression to the call to resolve().

I wonder if we can guard it under not path.is_absolute():

>>> from timeit import timeit >>> timeit('p.is_absolute()', setup='from pathlib import Path; p=Path(".")') 0.1643185840221122 >>> timeit('p.resolve()', setup='from pathlib import Path; p=Path(".")') 10.929103292000946

Edit: Seems like is_absolute() will always be false as things stand, so we probably need to look higher up the stack for a place to do some sort of conversion.

0nf · 2024-05-06T07:48:11Z

Hi @DanielNoord ! I haven't marked some of your review notes as resolved because I don't know if my answers to them were sufficient. Could you check if my comments actually answer your questions? 🙂

0nf · 2024-05-06T08:04:15Z

Also, I'm a bit confused about what to do with performance drop in test_baseline_benchmark_j1.

Is it critical, given that the time difference is less than 5ms in all cases, which is <1% in all test_baseline_lots_of_files* benchmarks?
If yes - would it be ok to hide Path.resolve() behind an analog of use-local-configs feature flag? I was thinking about a condition like len(self._directory_namespaces) > 1

DanielNoord · 2024-05-07T21:35:50Z

Just letting you know that this is on my TODO list but just haven't found the time yet.

DanielNoord

Thanks again for continuing with this @0nf

If you don't mind I could also split off some of the things I think we can easily merge into separate PRs to get them reviewed by other maintainers to make this PR a little bit more manageable.

DanielNoord · 2024-05-11T08:15:19Z

pylint/config/config_initialization.py

+    if len(linter._directory_namespaces) == 0:
+        linter._directory_namespaces[Path(".").resolve()] = (linter.config, {})


For now I'd prefer to revert the changes in these two lines.

They are really rightly coupled to the final implementation of the per directory configs and the performance impact is hard to judge on its own. As far as I can see, all other changes in this PR are somewhat sensible on their own. This one isn't.

DanielNoord · 2024-05-11T08:16:22Z

pylint/config/find_default_config_files.py

    """Iterate over the default config file names and see if they exist."""
+    basedir = Path(basedir)


Suggested change

basedir = Path(basedir)

That should not be needed.

DanielNoord · 2024-05-11T08:18:18Z

pylint/lint/pylinter.py

@@ -66,7 +66,7 @@
    ModuleDescriptionDict,
    Options,
 )
-from pylint.utils import ASTWalker, FileState, LinterStats, utils
+from pylint.utils import ASTWalker, FileState, LinterStats, merge_stats, utils


Sorry, you probably already explain it but I don't fully understand. This explanation seems to point to a general issue with stats merging, not something that has to do with multi-directory configs. Or am I misunderstanding you? If it is just a general issue we should tackle it in a separate PR.

DanielNoord · 2024-05-11T08:18:59Z

pylint/lint/pylinter.py

        with augmented_sys_path(extra_packages_paths):
+            # 2) Get the AST for each FileItem
+            ast_per_fileitem = self._get_asts(fileitems, data)
+            # 3) Lint each ast


This line should be down below

DanielNoord · 2024-05-11T08:19:19Z

pylint/lint/pylinter.py

        with augmented_sys_path(extra_packages_paths):
+            # 2) Get the AST for each FileItem
+            ast_per_fileitem = self._get_asts(fileitems, data)


I wonder if we have unintended effects from not getting these within the context manager..

DanielNoord · 2024-05-11T08:21:22Z

pylint/lint/pylinter.py


    def _lint_file(
        self,
        file: FileItem,
        module: nodes.Module,
-        check_astroid_module: Callable[[nodes.Module], bool | None],
+        check_astroid_module: Callable[[nodes.Module], bool | None] | None,


This PR is really touching some of the core behaviour of this behemoth of a class so it is a bit hard to review. Sorry in advance.

Why is this now optional? I don't really like that design as it further complicates this function body. Could you explain why this is needed? And could that perhaps be a separate PR?

This comment has been minimized.

Sign in to view

DanielNoord reviewed Apr 17, 2024

View reviewed changes

Aleksey Petryankin added 2 commits April 19, 2024 13:48

Fix formatting check when running tox

e1a88bf

0nf force-pushed the per_directory_configs_preliminary branch from 42bbb5a to 2f6c087 Compare April 19, 2024 11:48

This comment has been minimized.

Sign in to view

Preliminary changes 2

9b3576f

- Responses to review comments - Add test for calling _astroid_module_checker on different levels - Move Path.resolve() out of _get_namespace_for_file recursive calls

0nf force-pushed the per_directory_configs_preliminary branch from 2f6c087 to 9b3576f Compare April 20, 2024 08:25

jacobtylerwalls self-requested a review April 28, 2024 23:51

jacobtylerwalls reviewed May 1, 2024

View reviewed changes

DanielNoord reviewed May 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per directory configs - preliminary changes #9550

Per directory configs - preliminary changes #9550

0nf commented Apr 15, 2024

codecov bot commented Apr 16, 2024 •

edited

Loading

This comment has been minimized.

DanielNoord left a comment

DanielNoord Apr 17, 2024

0nf Apr 17, 2024

DanielNoord Apr 18, 2024

0nf Apr 20, 2024

DanielNoord Apr 17, 2024

0nf Apr 17, 2024

DanielNoord Apr 18, 2024

0nf Apr 20, 2024

DanielNoord Apr 17, 2024

0nf Apr 19, 2024 •

edited

Loading

DanielNoord May 11, 2024

art049 commented Apr 17, 2024

DanielNoord commented Apr 18, 2024

art049 commented Apr 18, 2024

0nf commented Apr 19, 2024

This comment has been minimized.

DanielNoord commented Apr 24, 2024

github-actions bot commented Apr 24, 2024

jacobtylerwalls May 1, 2024

jacobtylerwalls May 1, 2024

0nf commented May 6, 2024

0nf commented May 6, 2024

DanielNoord commented May 7, 2024

DanielNoord left a comment

DanielNoord May 11, 2024

DanielNoord May 11, 2024

DanielNoord May 11, 2024

DanielNoord May 11, 2024

DanielNoord May 11, 2024

DanielNoord May 11, 2024

		if Path(".").resolve() not in linter._directory_namespaces:
		linter._directory_namespaces[Path(".").resolve()] = (linter.config, {})

		config_path, namespace = self._get_namespace_for_file(
		Path(filepath).resolve(), self._directory_namespaces

		if len(linter._directory_namespaces) == 0:
		linter._directory_namespaces[Path(".").resolve()] = (linter.config, {})

		"""Iterate over the default config file names and see if they exist."""
		basedir = Path(basedir)

Per directory configs - preliminary changes #9550

Are you sure you want to change the base?

Per directory configs - preliminary changes #9550

Conversation

0nf commented Apr 15, 2024

Type of Changes

Description

codecov bot commented Apr 16, 2024 • edited Loading

Codecov Report

This comment has been minimized.

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0nf Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

art049 commented Apr 17, 2024

DanielNoord commented Apr 18, 2024

art049 commented Apr 18, 2024

0nf commented Apr 19, 2024

This comment has been minimized.

DanielNoord commented Apr 24, 2024

github-actions bot commented Apr 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0nf commented May 6, 2024

0nf commented May 6, 2024

DanielNoord commented May 7, 2024

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 16, 2024 •

edited

Loading

0nf Apr 19, 2024 •

edited

Loading