-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed size python string cache #59
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #59 +/- ##
==========================================
- Coverage 96.35% 96.27% -0.08%
==========================================
Files 8 9 +1
Lines 1234 1289 +55
==========================================
+ Hits 1189 1241 +52
- Misses 45 48 +3
Continue to review full report in Codecov by Sentry.
|
CodSpeed Performance ReportMerging #59 will not alter performanceComparing Summary
|
3c76b61
to
06480b2
Compare
2c851fd
to
6c47afd
Compare
Pushed a tiny commit to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks great except for one concern about stack usage. It's a shame I can't see a way to allocate a boxed array on stable.
impl Default for PyStringCache { | ||
fn default() -> Self { | ||
Self { | ||
entries: Box::new([ARRAY_REPEAT_VALUE; CAPACITY]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, have a worry here that this line can easily trigger stack overflows. Think we want to avoid accidentally causing import pydantic
to cause a stack overflow on some platform with less stack space.
Initial solution sounds like we need to reduce capacity and see how that goes.
Use a fully associative cache for strings, rather than a "simple" hashmap, when the number of strings exceeds the hashmap size (currently 500k), this gives a 48% performance improvements according to the
1m_strings
python benchmark.It's currently suffering fromthis was fixed by @davidhewitt.in python tests, this can be reproduced locally, and goes away if you remove the
GILOnceCell
, but that makes things slower (or less secure if you used constant seeds for the hasher).