Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common caching class and global cache flushing function #1242

Closed
wants to merge 16 commits into from

Conversation

keichi
Copy link
Contributor

@keichi keichi commented Nov 16, 2021

Description

This PR adds an LRU cache class astroid.cache.LRUCache and modifies the inference cache, inference tip cache, generator cache and @lru_cache to use this new cache. LRUCache is bounded by default and can also be manually flushed by calling astroid.cache.clear_caches() to address the memory leak issue discussed in #792.

Type of Changes

Type
🐛 Bug fix
✨ New feature
🔨 Refactoring
📜 Docs

Related Issue

Closes #792

@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.9.0 milestone Nov 16, 2021
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component Needs review 🔍 Needs to be reviewed by one or multiple more persons labels Nov 16, 2021
Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked at everything yet and I am by no means a caching expert, but I'd like to make an effort to getting some of the stale astroid PRs merged.

These are just some things I was thinking about before this could be merged.

I also wonder if we should add any form of tests. We have LruCacheModelTest right now. Does that work in itself for this change?

ChangeLog Outdated Show resolved Hide resolved

import wrapt

LRU_CACHE_CAPACITY = 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to this to be set-able somehow? I can imagine instances where somebody would want to cache to be larger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielNoord I'm not sure what you mean by "set-able"; it is settable. Would you like to add a setter method? Or make it overridable by instances?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to document the variable and explains explicitely how to modify it (assign a new value to it, I suppose). We can add the option in pylint config later but astroid users will need the info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably we could change this setting via an option, but now that I think of it I don't think we really have a good way to do so.

I'll create a "create option-setting system" project after this gets merged and add this global to the "to be configurable" list.

astroid/cache.py Outdated Show resolved Hide resolved
astroid/cache.py Outdated Show resolved Hide resolved
@keichi
Copy link
Contributor Author

keichi commented Dec 30, 2021

@DanielNoord Thank you for your review. I am addressing your comments one by one.

I also wonder if we should add any form of tests. We have LruCacheModelTest right now. Does that work in itself for this change?

Sure, I will add tests for LRUCache. LruCacheModelTest tests if @functools.lru_cache is inferred correctly so it does not (directly) test LRUCache.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good already, thank you @keichi !

astroid/cache.py Show resolved Hide resolved
Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keichi! This is becoming a very nice addition.

The tests are also well written.


import wrapt

LRU_CACHE_CAPACITY = 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably we could change this setting via an option, but now that I think of it I don't think we really have a good way to do so.

I'll create a "create option-setting system" project after this gets merged and add this global to the "to be configurable" list.

def __init__(self):
self.transforms = collections.defaultdict(list)

@lru_cache(maxsize=TRANSFORM_MAX_CACHE_SIZE)
@lru_cache_astroid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the 10000 from lru_cache translate to the 128 of LRUCache? Is it similar in size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, the current LRU_CACHE_CAPACITY is arbitrary. Should we change it to 10000?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.. 10000 seems a bit excessive, but the difference with 128 is so large that I wonder if this has unforeseen effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, a smaller cache size means more recomputation so it could lower the performance. However when I actually measure how long the unit tests take, I don't see any slowdown:

main (d2a5b3c):
pytest 19.51s user 0.96s system 108% cpu 18.893 total

this branch:
pytest 17.93s user 0.82s system 109% cpu 17.171 total

So at least for the unit tests, I don't see any negative side effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real problem will appear on a very large code base where the RAM won't be enough to cache everything. The tests are supposed to run fast so they run on small examples with a few nodes and they won't cause caching problem. As this can create a cache for every nodes, and because the number of nodes is close to infinite for large code base, I think we need to get a sense on what is the median weight of a node in the cache. Then probably limit the default value to a number of cache entry value corresponding to a low value of RAM used like 1Gb to be safe. (RAM is what is expensive on the cloud, so I suppose that by default jobs runs on low RAM machines).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can parse a large project and find out how the distribution of the node size looks like. Are there any projects you usually use for benchmarking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some project in pylint's test suite that we call primer tests. django is pretty huge for example. See https://pylint.pycqa.org/en/latest/development_guide/testing.html#primer-tests

astroid/cache.py Outdated Show resolved Hide resolved
astroid/cache.py Outdated Show resolved Hide resolved
tests/unittest_cache.py Outdated Show resolved Hide resolved
tests/unittest_cache.py Outdated Show resolved Hide resolved
tests/unittest_cache.py Outdated Show resolved Hide resolved
tests/unittest_cache.py Outdated Show resolved Hide resolved
tests/unittest_cache.py Outdated Show resolved Hide resolved
tests/unittest_cache.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keichi

I think I have done all I can with respect to this PR. Sadly, I'm not proficient enough with caching and some of the intricacies of typing generics to confidently say that this is now 100% perfect though. Hopefully some more experienced developer finds time to look at this soon.. 😄

astroid/cache.py Show resolved Hide resolved
astroid/cache.py Outdated Show resolved Hide resolved
astroid/cache.py Outdated Show resolved Hide resolved
@Pierre-Sassoulas
Copy link
Member

Hello @keichi, sorry for taking a long time to check this merge request. We currently have flake8-bugbear who (rightfully I think) warn us of potential cache issue on the main branch because we cache methods that will never be garbage collected. So merging this would solve two problems in one. I rebased the code on the latest main branch, but there's a recursion error in one of the test. Do you think you would have some time to check the issue in the near future ?

@coveralls
Copy link

coveralls commented Apr 10, 2022

Pull Request Test Coverage Report for Build 2143911660

  • 75 of 77 (97.4%) changed or added relevant lines in 10 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.03%) to 91.516%

Changes Missing Coverage Covered Lines Changed/Added Lines %
astroid/cache.py 44 46 95.65%
Totals Coverage Status
Change from base Build 2111457493: 0.03%
Covered Lines: 9126
Relevant Lines: 9972

💛 - Coveralls

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @keichi

ChangeLog Outdated Show resolved Hide resolved
@DanielNoord
Copy link
Collaborator

I haven't completed my review yet, but I see that two added lines are uncovered. Is there a way to cover these? Or did we already decide not to do so?

@keichi
Copy link
Contributor Author

keichi commented Apr 11, 2022

@DanielNoord How can I locate those two lines? The Coverall page does not show the source code for some reason.
Screen Shot 2022-04-11 at 18 59 02

@DanielNoord
Copy link
Collaborator

@DanielNoord How can I locate those two lines? The Coverall page does not show the source code for some reason.
Screen Shot 2022-04-11 at 18 59 02

They are L70 and L83 in astroid/cache.py. Coveralls doesn't show source code when you aren't logged in to reduce API calls on their "own" account so they require you to login/create an account to show annotated source code.

Copy link
Member

@jacobtylerwalls jacobtylerwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on pylint/astroid's performance. It's really important. :)

I'm wondering what the motivation was for reimplementing the basic LRU cache functionality. If it was chiefly to have a central call to flush all caches, we could just have the lru_cache_astroid decorator do that one thing: collect a reference to every function using functools.lru_cache so they can all be flushed.

Speaking of, we have a documented method clear_cache() already, so I'm worried we're expanding the API for no benefit and causing confusion. Better for that to call for c in cached_methods: c.clear_cache(), maybe? Just one flusher?

The reimplementation looks clean, so it shouldn't be hard to maintain. Still, I'm thinking we should be able to avoid merging it in. Especially if someday there comes a C-version in the stdlib that we would get for free by staying on the functools one.

If I'm missing something, let me know!


👀 cache by cache

  • LookupMixin.lookup(): I checked again on django: even a maxsize of 128 still generates 79K hits (instead of 85K), so I'm good with 128. This was the only lru_cache that we left unbounded after Update pylint requirement from ~=2.13.4 to ~=2.13.5 #1503, so bounding it seems important.
  • TransformVisitor._transform(): we realized in Update pylint requirement from ~=2.13.4 to ~=2.13.5 #1503 that this was generating 0 hits in typical scenarios and removed caching. So let's be sure not to add it back :-)
  • _cache_normalize_path(): honestly this doesn't look very important, 128 is fine
  • ObjectModel.attributes: same
  • infer_functiondef() and _inference_tip_cached(). These two look like the ballgame. If we already had these factored out to use functools.lru_cache, then we could already be using cache_info() to reason about them. I think that's my final argument in favor of not reimplementing this -- we're kind of losing some tooling by losing cache_info().

Happy to hear any thoughts on these directions! And thanks again for pushing this forward, as I said, it's really important.

def __init__(self):
self.transforms = collections.defaultdict(list)

@lru_cache(maxsize=TRANSFORM_MAX_CACHE_SIZE)
@lru_cache_astroid
Copy link
Member

@jacobtylerwalls jacobtylerwalls Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discovered in #1503 (comment) that in normal runs of pylint/astroid, this cache is never even hit in a single CLI run, so we removed it. Maybe an IDE has a use case for caching here, though, I don't know.

@Pierre-Sassoulas
Copy link
Member

Regarding the use of functool we discussed it here: #1242 (comment)

The main reason I couldn't reuse functool.lru_cache is because it hold the cache as a closure variable, and thus there is no way to clear it from outside.

Maybe it's not an issue if we choose a low cache size and it's enough for big project.

@jacobtylerwalls
Copy link
Member

Oh, I was thinking some light refactors might be involved to make them all reachable. I'll try to put together a little proof of concept to see if I'm suggesting something impossible.

@jacobtylerwalls
Copy link
Member

Wow, infer_functiondef() is only cached as part of a workaround for an issue very similar to #1490. With a workaround can remove caching completely and still pass that unitt test in 17a5ee6. We may eventually want a keyword argument set_parent_local or something if we have three places we have to avoid doing that.

@jacobtylerwalls
Copy link
Member

Thanks for pushing this discussion forward. We decided to go with #1521 in order to hew more closely to the current documented way to clear caches. Always eager to hear more proposals for improving performance. Thanks again.

@jacobtylerwalls jacobtylerwalls removed this from the 2.12.0 milestone May 6, 2022
@jacobtylerwalls jacobtylerwalls removed the Needs review 🔍 Needs to be reviewed by one or multiple more persons label May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continuously increasing memory usage when pylint is run via its API
6 participants