Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy 2.0.0 takes forever to lint #8150

Closed
chaen opened this issue Feb 1, 2023 · 17 comments · Fixed by pylint-dev/astroid#2209 or pylint-dev/astroid#2303
Closed

SQLAlchemy 2.0.0 takes forever to lint #8150

chaen opened this issue Feb 1, 2023 · 17 comments · Fixed by pylint-dev/astroid#2209 or pylint-dev/astroid#2303
Labels
Needs astroid update Needs an astroid update (probably a release too) before being mergable performance
Milestone

Comments

@chaen
Copy link

chaen commented Feb 1, 2023

Bug description

Linting against sqlalchemy 2.0.0 is infinitely slower that against 1.4 series.
This was reported here: sqlalchemy/sqlalchemy#9181

# pylint: disable=missing-module-docstring

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine("")
session = sessionmaker(engine)

Configuration

No response

Command used

time pylint slow.py

Pylint output

With sqlalchemy 2.0.0


time pylint slow.py 

-------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 7.50/10, +2.50)


real    0m11.233s
user    0m11.080s
sys     0m0.112s

with 1.4.46

time pylint slow.py 

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real    0m4.149s
user    0m4.032s
sys     0m0.100s

On real life use cases, it goes from 6 seconds to 90 !



### Expected behavior

I would expect things to run faster

### Pylint version

```shell
pylint 2.15.10
astroid 2.13.4
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 15:55:03) 
[GCC 10.4.0]

OS / Environment

Fedora 34, conda environment

Additional dependencies

sqlalchemy

@chaen chaen added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Feb 1, 2023
@Pierre-Sassoulas Pierre-Sassoulas added performance Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Feb 1, 2023
@clavedeluna
Copy link
Contributor

Can easily reproduce on macOS Python 3.11.0.

For some variety, I also ran it with --enable=all --enable-all-extensions and it's very similar results (so it's not
like it's adding more checkers that's an issue).

Running with -verbose doesn't add much info to this, other than it does hang longer at Using config file ....

Next step, use a profiler.

@clavedeluna
Copy link
Contributor

clavedeluna commented Feb 1, 2023

python -m cProfile -m -o sql14 pylint test.py
python -m cProfile -m -o sql200 pylint test.py

import pstats
from pstats import SortKey
fast = pstats.Stats('sql14')
slow = pstats.Stats('sql200')
fast.sort_stats(SortKey.CUMULATIVE).print_stats(10)
slow.sort_stats(SortKey.CUMULATIVE).print_stats(10)

Check Cumulative time

>>> fast.sort_stats(SortKey.CUMULATIVE).print_stats(10)
Wed Feb  1 09:13:35 2023    sql14

         9883958 function calls (8618495 primitive calls) in 7.963 seconds

   Ordered by: cumulative time
   List reduced from 4010 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    423/1    0.010    0.000    7.971    7.971 {built-in method builtins.exec}
        1    0.000    0.000    7.971    7.971 <frozen runpy>:201(run_module)
        1    0.000    0.000    7.965    7.965 <frozen runpy>:65(_run_code)
        1    0.000    0.000    7.965    7.965 /Users/dani/pylint/pylint/__main__.py:1(<module>)
        1    0.000    0.000    7.965    7.965 /Users/dani/pylint/pylint/__init__.py:28(run_pylint)
        1    0.000    0.000    7.072    7.072 /Users/dani/pylint/pylint/lint/run.py:119(__init__)
        1    0.000    0.000    6.660    6.660 /Users/dani/pylint/pylint/lint/pylinter.py:653(check)
        1    0.000    0.000    6.641    6.641 /Users/dani/pylint/pylint/lint/pylinter.py:753(_lint_files)
        1    0.000    0.000    6.641    6.641 /Users/dani/pylint/pylint/lint/pylinter.py:776(_lint_file)
        1    0.000    0.000    6.641    6.641 /Users/dani/pylint/pylint/lint/pylinter.py:1054(check_astroid_module)
>>> slow.sort_stats(SortKey.CUMULATIVE).print_stats(10)
Wed Feb  1 09:16:29 2023    sql200

         28488458 function calls (22226628 primitive calls) in 22.853 seconds

   Ordered by: cumulative time
   List reduced from 4054 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    423/1    0.012    0.000   22.864   22.864 {built-in method builtins.exec}
        1    0.000    0.000   22.864   22.864 <frozen runpy>:201(run_module)
        1    0.000    0.000   22.856   22.856 <frozen runpy>:65(_run_code)
        1    0.000    0.000   22.856   22.856 /Users/dani/pylint/pylint/__main__.py:1(<module>)
        1    0.000    0.000   22.856   22.856 /Users/dani/pylint/pylint/__init__.py:28(run_pylint)
        1    0.000    0.000   21.888   21.888 /Users/dani/pylint/pylint/lint/run.py:119(__init__)
        1    0.000    0.000   21.438   21.438 /Users/dani/pylint/pylint/lint/pylinter.py:653(check)
        1    0.000    0.000   21.418   21.418 /Users/dani/pylint/pylint/lint/pylinter.py:753(_lint_files)
        1    0.000    0.000   21.418   21.418 /Users/dani/pylint/pylint/lint/pylinter.py:776(_lint_file)
        1    0.000    0.000   21.418   21.418 /Users/dani/pylint/pylint/lint/pylinter.py:1054(check_astroid_module)

Pylint is spending 3x more cumulative time analyzing slqalchemy 2.xx than 1.4.xx

Check Total Time

>>> fast.sort_stats(SortKey.TIME).print_stats(10)
Wed Feb  1 09:13:35 2023    sql14

         9883958 function calls (8618495 primitive calls) in 7.963 seconds

   Ordered by: internal time
   List reduced from 4010 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
119729/90906    0.609    0.000    3.199    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/transforms.py:30(_transform)
   928556    0.510    0.000    0.706    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/brain/brain_numpy_utils.py:64(looks_like_numpy_member)
      299    0.480    0.002    0.480    0.002 {built-in method builtins.compile}
3270090/3265211    0.383    0.000    0.388    0.000 {built-in method builtins.isinstance}
246628/120    0.235    0.000    3.667    0.031 /Users/dani/env311/lib/python3.11/site-packages/astroid/transforms.py:59(_visit_generic)
119729/57    0.220    0.000    3.672    0.064 /Users/dani/env311/lib/python3.11/site-packages/astroid/transforms.py:50(_visit)
      479    0.168    0.000    0.168    0.000 {method 'read' of '_io.BufferedReader' objects}
 93104/40    0.167    0.000    6.574    0.164 /Users/dani/env311/lib/python3.11/site-packages/astroid/nodes/node_ng.py:140(infer)
   171437    0.138    0.000    0.202    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/brain/brain_builtin_inference.py:138(_builtin_filter_predicate)
    39689    0.132    0.000    0.368    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/rebuilder.py:1542(visit_name)


<pstats.Stats object at 0x108040a10>
>>> slow.sort_stats(SortKey.TIME).print_stats(10)
Wed Feb  1 09:16:29 2023    sql200

         28488458 function calls (22226628 primitive calls) in 22.853 seconds

   Ordered by: internal time
   List reduced from 4054 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1429694/440051    2.132    0.000    9.367    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/nodes/scoped_nodes/scoped_nodes.py:2353(ancestors)
1303975/60    1.936    0.000   21.368    0.356 /Users/dani/env311/lib/python3.11/site-packages/astroid/nodes/node_ng.py:140(infer)
155553/80170    0.810    0.000   16.587    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/transforms.py:30(_transform)
      504    0.734    0.001    0.734    0.001 {built-in method builtins.compile}
  1256124    0.693    0.000    0.958    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/brain/brain_numpy_utils.py:64(looks_like_numpy_member)
1057198/529003    0.688    0.000    0.765    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/nodes/scoped_nodes/mixin.py:33(qname)
4968283/4962037    0.649    0.000    0.711    0.000 {built-in method builtins.isinstance}
261643/64    0.645    0.000   21.365    0.334 /Users/dani/env311/lib/python3.11/site-packages/astroid/bases.py:146(_infer_stmts)
294845/51    0.526    0.000   21.368    0.419 /Users/dani/env311/lib/python3.11/site-packages/astroid/decorators.py:99(wrapped)
   300812    0.524    0.000    0.841    0.000 /Users/dani/env311/lib/python3.11/site-packages/astroid/context.py:132(clone)

**The functions that are adding a significant amount more in sqlalchemy 2.xx are **:

  • astroid/nodes/scoped_nodes/scoped_nodes.py:2353(ancestors)
  • /astroid/nodes/node_ng.py:140(infer)

@Zahlii
Copy link

Zahlii commented Feb 8, 2023

Is there a temporary workaround for this? Is there a way to tell astroid / pylint to ignore this module entirely? Or is the best bet to uninstall sqlalchemy during builds, run pylint, re-install, and ignore import errors?

If it helps, I modified the NodeNG to count how often it's called.

W/O sqlalchemy, my linting finished with:

defaultdict(<class 'int'>, {'Name': 67804, 'ClassDef': 15893, 'AssignName': 13069, 'Call': 4780, 'ImportFrom': 13757, 'Attribute': 9146, 'FunctionDef': 6131, 'AssignAttr': 358, 'Const': 1832, 'Subscript': 7318, 'ListComp': 22, 'Import': 1261, 'Tuple': 309, 'Arguments': 18, 'AsyncFunctionDef': 1, 'Dict': 233, 'List': 102, 'Starred': 1, 'Module': 484, 'BoolOp': 9, 'IfExp': 26, 'Compare': 163, 'EmptyNode': 9, 'DictComp': 84, 'BinOp': 130, 'AugAssign': 19, 'GeneratorExp': 11, 'UnaryOp': 27, 'Slice': 2, 'JoinedStr': 1})

out of which following ClassDef counters:

[('builtins.object', 2929), ('builtins.property', 919), ('pandas.core.base.SelectionMixin', 874), ('pandas.core.indexing.IndexingMixin', 531), ('numba.core.types.abstract.Type', 439), ('pandas.core.arraylike.OpsMixin', 429), ('pandas.core.generic.NDFrame', 405), ('pandas.core.base.PandasObject', 369), ('pandas.core.accessor.DirNamesMixin', 356), ('pandas.core.groupby.groupby.BaseGroupBy', 312)]

WITH sqlalchemy, after around 1-2 minutes of work: The big overhead seems to come from Name and ClassDef.

defaultdict(<class 'int'>, {'Name': 1440542, 'ClassDef': 421554, 'AssignName': 84403, 'Call': 2117, 'ImportFrom': 91390, 'Attribute': 82226, 'FunctionDef': 1228, 'AssignAttr': 93, 'Const': 728, 'Subscript': 720155, 'ListComp': 6, 'Import': 599, 'Tuple': 1173, 'Arguments': 10, 'AsyncFunctionDef': 1, 'Dict': 54, 'List': 14, 'Starred': 1, 'Module': 640, 'BoolOp': 5, 'IfExp': 2, 'Compare': 7, 'EmptyNode': 8, 'BinOp': 17, 'UnaryOp': 1, 'Lambda': 10, 'DictComp': 11, 'AugAssign': 2, 'GeneratorExp': 3})

out of which following ClassDef counters:

[('sqlalchemy.sql.roles.SQLRole', 78069), ('sqlalchemy.sql.roles.TypedColumnsClauseRole', 46186), ('sqlalchemy.sql.roles.ExpressionElementRole', 16204), ('sqlalchemy.sql.operators.Operators', 13965), ('sqlalchemy.sql.elements.SQLCoreOperations', 9458), ('sqlalchemy.sql.roles.AllowsLambdaRole', 4433), ('sqlalchemy.sql.roles.StructuralRole', 4049), ('sqlalchemy.sql.roles.ColumnListRole', 2934), ('sqlalchemy.sql.elements.SQLColumnExpression', 2908), ('sqlalchemy.sql.roles.BinaryElementRole', 2611)]

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Feb 8, 2023

The best we have right now is:

# Control the amount of potential inferred values when inferring a single object.
# This can help the performance when dealing with large functions or complex,
# nested conditions.
limit-inference-results = 100

# This flag controls whether pylint should warn about no-member and similar
# checks whenever an opaque object is returned when inferring. The inference
# can return multiple potential results while evaluating a Python object, but
# some branches might not be evaluated, which results in partial inference. In
# that case, it might be useful to still emit no-member and other checks for
# the rest of the inferred objects.
ignore-on-opaque-inference=yes

I'm not sure that it's going to help in this case because you might need to infer a single value but the recursion involved to do that is catastrophically costly. Might be worth a try though.

(Sorry for the semi-related joke but I'd like to introduce a new concept that I would call "design by ChatGPT": You know that your lib is not perfect and you want to make it better with intuitive configuration, so you ask chatGPT: What's the way to prevent catastrophic inference recursion in astroid using configuration ? Then ChatGPT tell you that:

from astroid import conf

conf.MAX_INFERENCE_STEPS = 1000
conf.TYPING_HEURISTICS = False
conf.INFERENCE_TIMEOUT = 10

By using these configuration options, you can prevent catastrophic inference recursion in Astroid and improve the performance and accuracy of your static analysis.

And bam, here's your design. Joke aside we talked about imposing a limit on inference by timeout or recursion limit, but it's not done right now, because then some messages would silently fail and be impossible to reproduce because the inference timed out or was cut unexpectedly.)

@Zahlii
Copy link

Zahlii commented Feb 9, 2023

image

None of these changes seem to have any impact on runtime and/or counter behavior above.

@Pierre-Sassoulas
Copy link
Member

Opened #8248 to discuss a long term solution.

@Zahlii
Copy link

Zahlii commented Feb 9, 2023

As a short-term workaround, I was able to pin sqlalchemy<2.0.0 via a constraints file and making sure to set the future=True argument (see also https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-to-2-0-step-four-use-the-future-flag-on-engine )

@mbyrnepr2
Copy link
Member

I am seeing the Astroid module brain_namedtuple_enum.py has a big affect on the results:

Standard Pylint run:

time pylint example.py 
pylint example.py  20.82s user 0.68s system 95% cpu 22.510 total

With brain_namedtuple_enum.py removed from a local install of astroid:

time pylint example.py 
pylint example.py  8.83s user 0.48s system 93% cpu 9.977 total
SQLAlchemy==2.0.0
Darwin Kernel Version 20.6.0

@jacobtylerwalls jacobtylerwalls self-assigned this Jun 7, 2023
@jacobtylerwalls jacobtylerwalls added Needs astroid update Needs an astroid update (probably a release too) before being mergable and removed Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning labels Jun 11, 2023
@jacobtylerwalls jacobtylerwalls added this to the 3.0.0a7 milestone Jun 11, 2023
@kuba-lilz
Copy link

kuba-lilz commented Jun 13, 2023

Sure happy to hear that root cause was fixed, but until fix is released I don't think it's correct to close this issue. I can install latest astroid and pylint (astroid==2.15.5, pylint==2.17.4), and sqlalchemy 2.x linting is still slow.

@jacobtylerwalls
Copy link
Member

We close issues when merging fixes, not when cutting releases.

@jacobtylerwalls
Copy link
Member

You can choose to install bleeding-edge astroid or wait for an alpha or a stable version, it's up to each user.

@jacobtylerwalls
Copy link
Member

The expected release milestone is marked in this issue.

@POD666
Copy link

POD666 commented Aug 6, 2023

@jacobtylerwalls, I have been waiting for releases for a while, seems to be the latest

astroid==2.15.6
pylint==2.17.5
sqlalchemy==2.0.5

but linting time is still extremely huge

> time pylint [single-file]
...
pylint   33.78s

Do I miss anything? Which version of astroid/pylint is expected to contain the fix?

@POD666
Copy link

POD666 commented Aug 6, 2023

Ah, looks like astroid==2.15.6 doesn't contain your fix. Right?

Is it expected in >3 and not sooner?

@POD666
Copy link

POD666 commented Aug 6, 2023

Looks like pylint==3.0.0a6 and astroid==3.0.0a8 doesn't work together 🥲

File ".../lib/python3.10/site-packages/pylint/checkers/imports.py", line 1038, in ImportsChecker
    @astroid.decorators.cached
AttributeError: module 'astroid.decorators' has no attribute 'cached'

@jacobtylerwalls
Copy link
Member

Yep, support for the latest astroid improvements only exists on the main branch until we can get a release out. You can install from the main branch with pip, though.

@jacobtylerwalls
Copy link
Member

You'll notice there's only one issue left on the 3.0.0a7 milestone, which another maintainer regards as a release blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs astroid update Needs an astroid update (probably a release too) before being mergable performance
Projects
None yet
8 participants