-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: simple flat storage #7345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. It will be good to get some more eyes on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable!
The only non-nit comment from me is about the logic to enable flat_state.
It seems like the code makes flat_state a property of a trie, but, if I understand correctly (which I might not), it semes to be a property of a particular state_root, rather than trie.
core/store/src/trie/shard_tries.rs
Outdated
Trie::new(storage, flat_state) | ||
} | ||
|
||
pub fn get_trie_with_optional_flat_state_for_shard( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this private? There are two reasons why pub
doesn't sit well with me:
- functions with a bunch of switches (like two bool parametres here) are usually brittle public interface
- before, we didn't expose
is_view
as a switch -- the caller always statically knew whether they need view trie or plain trie. It feels better not to expose this, as this is bug prone. In fact, it seems to me that, at the call-site, we actually pass wrongis_view
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was a mistake.
We need tries (states) of three types: view trie, trie, trie with flat state, so now I create 3 separate methods for getting them.
nearcore/src/runtime/mod.rs
Outdated
fn get_trie_for_shard(&self, shard_id: ShardId, prev_hash: &CryptoHash) -> Result<Trie, Error> { | ||
let shard_uid = self.get_shard_uid_from_prev_hash(shard_id, prev_hash)?; | ||
Ok(self.tries.get_trie_for_shard(shard_uid)) | ||
let use_flat_state = cfg!(feature = "protocol_feature_flat_state"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the actual logic we want to use to enable flat_store? I would like to avoid every call-site passing an extra parameter, and somehow design the API such that it just works.
I also wonder if use_flat_state
should be a parameter of the trie at all? The way I understand it, trie is a persistent (functional) data structure, which stores all historical evolution of the state. To get a particular snapshot, you need a trie and a state root.
If flat_state is what we store for the tip, then "use_flat_state" seems to be a property of a particular state-root, rather than a property of the trie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the protocol logic, Trie represents the state, e.g. we give trie (state) keys to it and ask for values. Flat state has the same purpose, and serves as trie optimization for roots close to the tip. That's why I think they should live in the same structure, possibly I need to rename Trie
to State
, TrieUpdate
to StateUpdate
, etc.
As for use_flat_state
, I see two ways to handle decision whether to use flat state or not:
- On
Trie
creation - flat state struct will be eitherNone
orSome(FlatState)
- On each
get
operation - make decision based onstate_root
I prefer the first one, because once Trie is created, we always query it with the same root, and I don't see future use-cases when it won't be the case. Moreover, I think that root
has to become a field of the Trie
and it should be removed from get
and get_ref
parameters. So if we know a root, we also know if we use flat state or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, I think that root has to become a field of the Trie and it should be removed from get and get_ref parameters.
Yeah, this clear my concern! If we "bind" trie to the specific state root, the API makes perfect sense!
#[cfg(feature = "protocol_feature_flat_state")] | ||
return self | ||
.store | ||
.get(DBCol::FlatState, key) | ||
.map_err(|_| StorageError::StorageInternalError); | ||
#[cfg(not(feature = "protocol_feature_flat_state"))] | ||
unreachable!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg(feature = "protocol_feature_flat_state")] | |
return self | |
.store | |
.get(DBCol::FlatState, key) | |
.map_err(|_| StorageError::StorageInternalError); | |
#[cfg(not(feature = "protocol_feature_flat_state"))] | |
unreachable!(); | |
self | |
.store | |
.get(DBCol::FlatState, key) | |
.map_err(|_| StorageError::StorageInternalError) |
As we now gate constructor, there's no need to cfg-gate anything else (but it probably makes sense to add a comment to new
pointing out that #[cfg(feature = "protocol_feature_flat_state")]
is load-bearing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do it because I cfg-gate DBCol::FlatState
as well - I don't want regular nodes to fail because they don't store this column in db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't fail on regular nodes, b/c regular nodes won't have self because the new is gated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that I gate enum option DBCol::FlatState
. If I don't do it, it automatically occur in DBCol::iter
and cause issues. It is simpler to gate this option than deal with correct store opening, etc.
@@ -234,6 +261,34 @@ impl ShardTries { | |||
) -> (StoreUpdate, StateRoot) { | |||
self.apply_all_inner(trie_changes, shard_uid, true) | |||
} | |||
|
|||
#[cfg(feature = "protocol_feature_flat_state")] | |||
pub fn apply_changes_to_flat_state( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be pub(crate)
perhaps? or even just private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best way to do this is to unite it with apply_all
, which accepts only TrieChanges so far. Seems doable, but requires more refactoring work
@@ -53,6 +53,7 @@ protocol_feature_chunk_only_producers = [ | |||
"near-chain-configs/protocol_feature_chunk_only_producers", | |||
"near-primitives/protocol_feature_chunk_only_producers", | |||
] | |||
protocol_feature_flat_state = ["near-store/protocol_feature_flat_state"] | |||
|
|||
nightly = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we intentionally not including protocol_feature_flat_state
in nightly
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is not ready for nightly yet.
First step towards flat storage: #7327 Continue work here because #7295 was broken. Here we start building it under protocol feature without adding it to nightly. The goal is to support it for localnet with one node which never sends skip approvals, so later we could extend it to more nodes and eventually implement a migration to mainnet. Now, everyone should be able to build neard with this feature, set `max_block_production_delay` to something big and run localnet. I currently couldn't make all tests work with enabled feature, e.g. `test_care_about_shard`, but I open it for review because I don't think it affects the idea. ## Testing * Check that encoding and decoding value reference gives expected results. `ValueRef` can be reused in other places in code * Check that flat state is used in regular trie handler and not used in view trie handler * Check that after block processing, getting value from regular trie and view trie gives the same result
First step towards flat storage: #7327
Continue work here because #7295 was broken.
Here we start building it under protocol feature without adding it to nightly. The goal is to support it for localnet with one node which never sends skip approvals, so later we could extend it to more nodes and eventually implement a migration to mainnet.
Now, everyone should be able to build neard with this feature, set
max_block_production_delay
to something big and run localnet.I currently couldn't make all tests work with enabled feature, e.g.
test_care_about_shard
, but I open it for review because I don't think it affects the idea.Testing
ValueRef
can be reused in other places in code