Remove leaky lru_cache #2277

mrcljx · 2024-06-05T08:06:53Z

Problem

After calling the endpoint, the instance db_session was not freed, causing a warning to be emitted.

def endpoint(request):
  data = MySchema(context={"db_session": create_db_session()}).load(request.json)
  return "OK"

The reason is that the MySchema instance is kept in a global LRU cache. Calling endpoint multiple times will eventually release the older db_session instances, though.

Analysis

The @lru_cache(max_size=8) decorator was used on a method, causing instances of self to be cached and outlive their intended scope. This led to unexpected behavior where instances persisted beyond a single web request, retaining references and consuming memory unnecessarily.

This problem was called out by PyCQA/flake8-bugbear#310 but ignored via noqa.

Options

Use cachetools.cachedmethod (requires a third party dependency)
Implement manually
Remove LRU cache

Proposal

The LRU cache was introduced in #1309 (the PR reported 3% performance increase on the test base). I'm not sure the LRU cache actually has that much of an effect in real world scenarios:

The cost of two tuple constructions and two dict lookups is small in the first place
If more than 2 schema instances are used, max_size is exhausted, leading to cache misses.
- A cold load causes 3 misses, 0 hits (warm -> 3 hits)
- A cold dump causes 2 misses, 0 hits (warm -> 3 hits).
A more complex schema guarantees 100% misses, as too many instances are involved.

class Child(ma.Schema):
  pass

class Parent(ma.Schema):
  a = ma.fields.Nested(Child)
  b = ma.fields.Nested(Child)
  c = ma.fields.Nested(Child)

ma.Schema._has_processors.cache_clear()
p = Parent()
p.load({"a": {}, "b": {}, "c": {}})
p.load({"a": {}, "b": {}, "c": {}})
ma.Schema._has_processors.cache_info() # CacheInfo(hits=0, misses=24, maxsize=8, currsize=8)

sloria

good catch! i hadn't considered that side effect of context objects remaining in the global LRU cache. and i generally agree that the CPU cost in typical usage scenarios is small.

i'm good with this, but i'll give a moment to @lafrech @deckar01 to look if they want. otherwise i'll plan to merge and release this over the weekend

sloria · 2024-06-05T17:40:44Z

in the meantime, do you mind adding yourself to AUTHORS.rst please?

lafrech

I agree with the rationale.

Thanks.

deckar01 · 2024-06-05T21:18:00Z

TLDR: 👍

I tested this using our dump benchmark. Indexing dict with a tuple is pretty slow, but that's mostly what lru_cache does under the hood, so it can't help much. ~~lru_cache is actually significantly slower in pypy.~~ I also benchmarked a hacky refactor that replaces the tuple keys with separate dicts for many hooks and tag-only keys.

(usec/dump)	cpython (3.10)	pypy (3.10)
mrcljx:remove-lru-cache	475	46
marshmallow-code:dev	475	51
deckar01:fast-hook-key	470	36

The benchmark shows ~30% of the hot path is tuple hashing, at least in pypy. I'm not sure the 1% cpython difference is worth the increased code complexity of my current refactor. Something to punt to a follow up issue maybe.

mrcljx · 2024-06-05T23:36:49Z

Thanks for the benchmark! I updated AUTHORS.rst.

sloria approved these changes Jun 5, 2024

View reviewed changes

lafrech approved these changes Jun 5, 2024

View reviewed changes

deckar01 approved these changes Jun 5, 2024

View reviewed changes

Remove leaky lru_cache

4d165a8

mrcljx force-pushed the remove-lru-cache branch from 2586230 to 4d165a8 Compare June 5, 2024 23:34

Update changelog

5a6008e

sloria enabled auto-merge (squash) June 6, 2024 01:50

sloria merged commit 99103a6 into marshmallow-code:dev Jun 6, 2024
7 of 8 checks passed

mrcljx deleted the remove-lru-cache branch June 6, 2024 06:51

deckar01 mentioned this pull request Jun 6, 2024

Simplify Hooks #2279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove leaky lru_cache #2277

Remove leaky lru_cache #2277

mrcljx commented Jun 5, 2024 •

edited

Loading

sloria left a comment

sloria commented Jun 5, 2024

lafrech left a comment

deckar01 commented Jun 5, 2024 •

edited

Loading

mrcljx commented Jun 5, 2024

Remove leaky lru_cache #2277

Remove leaky lru_cache #2277

Conversation

mrcljx commented Jun 5, 2024 • edited Loading

Problem

Analysis

Options

Proposal

sloria left a comment

Choose a reason for hiding this comment

sloria commented Jun 5, 2024

lafrech left a comment

Choose a reason for hiding this comment

deckar01 commented Jun 5, 2024 • edited Loading

mrcljx commented Jun 5, 2024

mrcljx commented Jun 5, 2024 •

edited

Loading

deckar01 commented Jun 5, 2024 •

edited

Loading