-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: flat storage cache #8540
feat: flat storage cache #8540
Conversation
@Longarithm I discussed with @akhi3030 that it seems a bit odd we need to build this cache on top of RocksDB, when RocksDB allows for efficient caching. I would like to test if enabling the RowCache (a cache that stores key-values pairs instead of blocks but it is currently disabled for all columns) in RocksDB couldn't achieve the same goal. I think it would make the code and its maintenance easier. But it will take me at least a week to get this benchmarked based on current priorities. What do you think, should we wait with this PR until we know the results? |
It's funny that when I googled RowCache, this was the first link https://groups.google.com/g/rocksdb/c/YxdRryNVTyw/m/ZobfhrO8AAAJ
If it actually helps, that would be great, I'm interested in new results. If I remember correctly, RocksDB block cache didn't help with that. |
core/store/src/flat_state.rs
Outdated
if guard.value_ref_cache.len() == guard.value_ref_cache.cap() { | ||
if let Some((key, _)) = guard.value_ref_cache.pop_lru() { | ||
guard.metrics.value_ref_cache_total_key_size.sub(key.len() as i64); | ||
} | ||
} | ||
if guard.value_ref_cache.len() < guard.value_ref_cache.cap() { | ||
guard.metrics.value_ref_cache_total_key_size.add(key.len() as i64); | ||
guard.value_ref_cache.put(key.clone(), value.clone()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be able to simplify this code with the lru cache push
method - it checks the capacity by itself and returns a popped item if any. Not sure how it would handle 0 capacity though.
https://docs.rs/lru/latest/lru/struct.LruCache.html#method.push
pub static FLAT_STORAGE_VALUE_REF_CACHE_TOTAL_KEY_SIZE: Lazy<IntGaugeVec> = Lazy::new(|| { | ||
try_create_int_gauge_vec( | ||
"flat_storage_value_ref_cache_total_key_size", | ||
"Total size of all keys in flat storage cache for its head", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity why do you only measure the size of the keys and not keys and values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because size of values is fixed - we store ValueRef
s instead of values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we measure both since we might implement inlining in the future?
core/store/src/flat_state.rs
Outdated
@@ -991,6 +1011,9 @@ impl FlatStorageState { | |||
}; | |||
} | |||
|
|||
if let Some(value_ref) = guard.get_cached_ref(key) { | |||
return Ok(value_ref); | |||
} | |||
Ok(store_helper::get_ref(&guard.store, key)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also push the just read value back to the cache? Currently you only update values in the cache when writing new values there. I suppose either can be fine but cache hit rate will heavily depend on the usage pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
To understand the row cache, I found this blog post (from January this year) helpful: https://betterprogramming.pub/navigating-the-minefield-of-rocksdb-configuration-options-246af1e1d3f9 Especially this part:
|
lgtm but I'll let someone familiar with flat storage review it too |
|
||
/// Get cached `ValueRef` for flat storage head. | ||
#[cfg(feature = "protocol_feature_flat_state")] | ||
fn get_cached_ref(&mut self, key: &[u8]) -> Option<Option<ValueRef>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why the returned value is Option<Option<>> since it could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Previously I implemented an empty struct called FlatStateCache with the intention of adding cache implementation and it is part of FlatState. Could you remove that code? You can either do it in this PR or in a separate PR. Thank you!
pub static FLAT_STORAGE_VALUE_REF_CACHE_TOTAL_KEY_SIZE: Lazy<IntGaugeVec> = Lazy::new(|| { | ||
try_create_int_gauge_vec( | ||
"flat_storage_value_ref_cache_total_key_size", | ||
"Total size of all keys in flat storage cache for its head", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we measure both since we might implement inlining in the future?
We don't have time for researching RocksDB RowCache for now, added todo to the code: #8649 |
Based on @jakmeier' estimations, we need to cache
ValueRef
s for flat storage head (see #8006). RocksDB internal impl and block cache doesn't help, and we need to make flat storage performance to be at least comparable to trie performance in MVP, in order not to make undercharging issue worse.This cache lives inside
FlatStorageState
, can be accessed inget_ref
before attempt to read value ref from flat storage head, and must be updated when we apply delta.I think it makes sense to make cache capacity configurable, and this config fits into
StoreConfig
. I don't like that is propagated toFlatStorageState
fromShardTries
, it makes trie storage and flat storage mixed even more. Perhaps it needs to be fully moved insideFlatStateFactory
, but I am not sure.Testing
flat_storage_state_sanity
to check both cached and non-cached versions;flat_storage_state_cache_eviction
to check that eviction strategy is applied correctly.