fixed size python string cache #59

samuelcolvin · 2024-01-20T08:23:33Z

Use a fully associative cache for strings, rather than a "simple" hashmap, when the number of strings exceeds the hashmap size (currently 500k), this gives a 48% performance improvements according to the 1m_strings python benchmark.

~~It's currently suffering from~~ this was fixed by @davidhewitt.

thread 'test_python_cache_usage_all' has overflowed its stack
fatal runtime error: stack overflow

in python tests, this can be reproduced locally, and goes away if you remove the GILOnceCell, but that makes things slower (or less secure if you used constant seeds for the hasher).

codecov · 2024-01-20T08:24:34Z

Codecov Report

Merging #59 (e284786) into main (56db4fc) will decrease coverage by 0.08%.
The diff coverage is 94.56%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #59      +/-   ##
==========================================
- Coverage   96.35%   96.27%   -0.08%     
==========================================
  Files           8        9       +1     
  Lines        1234     1289      +55     
==========================================
+ Hits         1189     1241      +52     
- Misses         45       48       +3

Files	Coverage Δ
src/python.rs	`96.52% <100.00%> (+0.59%)`	⬆️
src/py_string_cache.rs	`93.97% <93.97%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56db4fc...e284786. Read the comment docs.

codspeed-hq · 2024-01-20T08:25:26Z

CodSpeed Performance Report

Merging #59 will not alter performance

_{Comparing fixed-size-cache (e284786) with main (56db4fc)}

Summary

✅ 56 untouched benchmarks

davidhewitt · 2024-03-21T15:11:06Z

Pushed a tiny commit to use RandomState instead of BuildHasherDefault, this seems to be slightly faster and I think more secure anyway.

davidhewitt

All looks great except for one concern about stack usage. It's a shame I can't see a way to allocate a boxed array on stable.

davidhewitt · 2024-03-21T15:30:53Z

src/py_string_cache.rs

+impl Default for PyStringCache {
+    fn default() -> Self {
+        Self {
+            entries: Box::new([ARRAY_REPEAT_VALUE; CAPACITY]),


As discussed offline, have a worry here that this line can easily trigger stack overflows. Think we want to avoid accidentally causing import pydantic to cause a stack overflow on some platform with less stack space.

Initial solution sounds like we need to reduce capacity and see how that goes.

samuelcolvin force-pushed the fixed-size-cache branch 2 times, most recently from 3c76b61 to 06480b2 Compare January 21, 2024 15:28

samuelcolvin added 6 commits February 5, 2024 10:34

fixed size python string cache

33c3bbf

split string and value caching

1a5d773

try more slots

b0328bc

use array instead of vec

ce01aaf

update python benchmarks to use new string cache

e8654b3

cleanup and comments

6c47afd

samuelcolvin force-pushed the fixed-size-cache branch from 2c851fd to 6c47afd Compare February 5, 2024 10:34

samuelcolvin and others added 7 commits February 5, 2024 11:49

add 1m_strings case to python benchmarks

11a0932

use a boxed array

2ed70c5

fix clippy complaint

1f38463

Merge branch 'main' into fixed-size-cache

a505262

update fuzz lockfile

48aa2a8

Merge branch 'main' into fixed-size-cache

47aec77

make cached_py_string public

a5ae93a

samuelcolvin mentioned this pull request Mar 21, 2024

caching strings from JSON pydantic/pydantic-core#1240

Merged

4 tasks

use RandomState to avoid DoS attacks

8c38ab0

davidhewitt reviewed Mar 21, 2024

View reviewed changes

samuelcolvin added 5 commits March 25, 2024 15:35

Merge branch 'main' into fixed-size-cache

56f3d40

use vec not array for cache

fee5750

add tests for StringCacheMode extract

b85beae

change cachge size, fix jiter-python benchmarks

b53f64a

try 16_384

e284786

samuelcolvin merged commit 979eab5 into main Mar 25, 2024
10 checks passed

samuelcolvin deleted the fixed-size-cache branch March 25, 2024 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed size python string cache #59

fixed size python string cache #59

samuelcolvin commented Jan 20, 2024 •

edited

Loading

codecov bot commented Jan 20, 2024 •

edited

Loading

codspeed-hq bot commented Jan 20, 2024 •

edited

Loading

davidhewitt commented Mar 21, 2024

davidhewitt left a comment

davidhewitt Mar 21, 2024

fixed size python string cache #59

fixed size python string cache #59

Conversation

samuelcolvin commented Jan 20, 2024 • edited Loading

codecov bot commented Jan 20, 2024 • edited Loading

Codecov Report

codspeed-hq bot commented Jan 20, 2024 • edited Loading

CodSpeed Performance Report

Merging #59 will not alter performance

Summary

davidhewitt commented Mar 21, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt Mar 21, 2024

Choose a reason for hiding this comment

samuelcolvin commented Jan 20, 2024 •

edited

Loading

codecov bot commented Jan 20, 2024 •

edited

Loading

codspeed-hq bot commented Jan 20, 2024 •

edited

Loading