-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent timing issues causing inconsistencies between concurrent hash table and policy data structures #348
Conversation
data structures caused by timing issues - Replace the boolean `is_dirty` with two atomic u16 counters `entry_gen` and `policy_gen` to track the number of times an cached entry has been updated and the number of times its write log has been applied to the policy data structures. - Update an internal `handle_upsert` method to check the `entry_gen` counter when removing an entry from the hash table.
data structures caused by timing issues Update the test case for checking the byte size of `EntryInfo`.
data structures caused by timing issues - Address some corner cases. - Add and update test cases.
data structures caused by timing issues Fix a test for Linux 32-bit platforms.
data structures caused by timing issues Fix a test for Linux 32-bit platforms.
data structures caused by timing issues Update some source code comments.
To clarify the remaining tasks, added the tasks section with check boxes to the description of this PR. |
FYI, I will be unable to work on Moka and other products for a couple of weeks from now. I will be in a hospital for medical treatments. I will have some chances to check GitHub Issues and Discussions from there, but will not actively respond to them. Sorry for the inconvenience. I will resume work on this PR when I return from the hospital and everything is settled. |
Take your time, health is the most important thing in life. |
data structures caused by timing issues Update internal eviction methods to utilize `is_dirty()` method.
data structures caused by timing issues Fix compile errors when only the `sync` feature is enabled by making some internal methods available for the feature.
data structures caused by timing issues Increment the entry generation when invalidating an entry.
data structures caused by timing issues Update an internal `admit` method to skip if a victim entry `is_dirty`.
Updated the description of this issue. |
data structures caused by timing issues Brush up the change log and source code comments.
data structures caused by timing issues - Found a corner-case issue in the `handle_upsert` method and fix it: - Fix the `handle_upsert` method by checking if the `EntryInfo`s are the same from the `WriteOp` and the concurrent hash table. - Add a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging.
Thank you for reporting. That is a great news! I didn't realize it until you mentioned it, but they should be related. The entries affected by this bug are counted in the cache usage even though they have no value (thus cache misses). Since they will never be evicted, the hit rate gradually decreases as the cache is occupied by these broken entries. |
Descriptions
Fixes #345.
This pull request (PR) prevents timing issues in writes that cause inconsistencies between the cache's internal data structures such as the concurrent hash table and access-order queue.
Example
The following new test case reproduces this issue in
v0.12.0
andv0.12.1
:test_race_between_updating_entry_and_processing_its_write_op
Cache
withmax_capacity = 2
andtime_to_idle = 1 second
.a
.b
.c
.c
will be evicted as the cache is full.a
andb
should expire.c
again.Cache
.c
.run_pending_tasks
.entry_count
.Expected
0
should be returned.Actual
1
was returned.Cause
When processing pending tasks,
Cache
did not check if more than one pending tasks exist for the same key.In the above example, key
c
had two pending tasks at step 6; first one from step 4 (when cache is full) and second one from step 6 (when cache has enough room).At step 6, pending tasks up to step 4 should be processed:
a
is processed:a
is admitted and added to the access-order queue (the LRU deque).b
is processed:b
is admitted and added to the LRU.c
(at step 4) is processed:Cache
is full, and removed from the concurrent hash table (CHT).c
(second insertion at step 6) from the CHT (wrong).a
is considered expired:b
is considered expired:Then, at step 9, pending tasks from step 6 and 7 should be processed:
c
(at step 6) is processed:"c"
is admitted and added to the LRU.c
in the CHT)c
(at step 7).c
did not exist at step 7.c
is considered expired:c
is supposed to be removed from both the CHT and LRU, but it will not be, because it does not exist in the CHT.This issue was already present in
v0.11.x
and older versions, but were less likely to occur because the window for the timings was much smaller thanv0.12.0
andv0.12.1
because these older versions used background threads to periodically process pending tasks.Fixes
When processing write operation logs in pending tasks, check if more than one logs exist for the same key:
u16
counters to each key in the CHT to track the following generations (versions):entry_gen
: Will be incremented on every insertion, update and removal of the key.entry_gen
after the increment.policy_gen
: Will be updated on every time the write operation log is processed for the key.entry_gen
in the log equals to the currententry_gen
of the key.After processing write operation logs,
Cache
will process other pending tasks such as evicting expired entries. This PR also incudes various small improvements in these steps by checking if each key still has pending write operation logs. This is done by callingis_dirty
method, which returnstrue
whenentry_gen != policy_gen
.Checklist
Add check if
entry_gen
in the write operation log equals to the currententry_gen
of the key:future::base_cache::Internal
structhandle_upsert
methodsync::base_cache::Internal
structhandle_upsert
methodAdd
is_dirty
checks to the following internal methods:future::base_cache::Internal
structevict_expired_entries_using_timers
methodremove_expired_ao
methodremove_expired_wo
methodinvalidate_entries
methodevict_lru_entries
methodsync::base_cache::Internal
structevict_expired_entries_using_timers
methodremove_expired_ao
methodremove_expired_wo
methodinvalidate_entries
methodevict_lru_entries
method