Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: simple flat storage #7345

Merged
merged 41 commits into from
Aug 9, 2022
Merged

feat: simple flat storage #7345

merged 41 commits into from
Aug 9, 2022

Conversation

Longarithm
Copy link
Member

@Longarithm Longarithm commented Aug 4, 2022

First step towards flat storage: #7327
Continue work here because #7295 was broken.

Here we start building it under protocol feature without adding it to nightly. The goal is to support it for localnet with one node which never sends skip approvals, so later we could extend it to more nodes and eventually implement a migration to mainnet.

Now, everyone should be able to build neard with this feature, set max_block_production_delay to something big and run localnet.

I currently couldn't make all tests work with enabled feature, e.g. test_care_about_shard, but I open it for review because I don't think it affects the idea.

Testing

  • Check that encoding and decoding value reference gives expected results. ValueRef can be reused in other places in code
  • Check that flat state is used in regular trie handler and not used in view trie handler
  • Check that after block processing, getting value from regular trie and view trie gives the same result

@Longarithm Longarithm self-assigned this Aug 4, 2022
@Longarithm Longarithm marked this pull request as ready for review August 4, 2022 22:33
@Longarithm Longarithm requested a review from a team as a code owner August 4, 2022 22:33
@Longarithm Longarithm requested a review from mina86 August 4, 2022 22:33
@Longarithm Longarithm mentioned this pull request Aug 4, 2022
@Longarithm Longarithm requested a review from matklad August 4, 2022 22:35
Copy link
Collaborator

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. It will be good to get some more eyes on it.

core/primitives/src/state.rs Outdated Show resolved Hide resolved
core/primitives/src/state_record.rs Show resolved Hide resolved
Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable!

The only non-nit comment from me is about the logic to enable flat_state.

It seems like the code makes flat_state a property of a trie, but, if I understand correctly (which I might not), it semes to be a property of a particular state_root, rather than trie.

core/primitives/src/state.rs Outdated Show resolved Hide resolved
core/primitives/src/state.rs Outdated Show resolved Hide resolved
core/store/src/flat_state.rs Outdated Show resolved Hide resolved
core/store/src/flat_state.rs Outdated Show resolved Hide resolved
core/store/src/flat_state.rs Show resolved Hide resolved
core/store/src/flat_state.rs Outdated Show resolved Hide resolved
core/store/src/lib.rs Outdated Show resolved Hide resolved
Trie::new(storage, flat_state)
}

pub fn get_trie_with_optional_flat_state_for_shard(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private? There are two reasons why pub doesn't sit well with me:

  • functions with a bunch of switches (like two bool parametres here) are usually brittle public interface
  • before, we didn't expose is_view as a switch -- the caller always statically knew whether they need view trie or plain trie. It feels better not to expose this, as this is bug prone. In fact, it seems to me that, at the call-site, we actually pass wrong is_view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was a mistake.
We need tries (states) of three types: view trie, trie, trie with flat state, so now I create 3 separate methods for getting them.

fn get_trie_for_shard(&self, shard_id: ShardId, prev_hash: &CryptoHash) -> Result<Trie, Error> {
let shard_uid = self.get_shard_uid_from_prev_hash(shard_id, prev_hash)?;
Ok(self.tries.get_trie_for_shard(shard_uid))
let use_flat_state = cfg!(feature = "protocol_feature_flat_state");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the actual logic we want to use to enable flat_store? I would like to avoid every call-site passing an extra parameter, and somehow design the API such that it just works.

I also wonder if use_flat_state should be a parameter of the trie at all? The way I understand it, trie is a persistent (functional) data structure, which stores all historical evolution of the state. To get a particular snapshot, you need a trie and a state root.

If flat_state is what we store for the tip, then "use_flat_state" seems to be a property of a particular state-root, rather than a property of the trie.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the protocol logic, Trie represents the state, e.g. we give trie (state) keys to it and ask for values. Flat state has the same purpose, and serves as trie optimization for roots close to the tip. That's why I think they should live in the same structure, possibly I need to rename Trie to State, TrieUpdate to StateUpdate, etc.

As for use_flat_state, I see two ways to handle decision whether to use flat state or not:

  • On Trie creation - flat state struct will be either None or Some(FlatState)
  • On each get operation - make decision based on state_root

I prefer the first one, because once Trie is created, we always query it with the same root, and I don't see future use-cases when it won't be the case. Moreover, I think that root has to become a field of the Trie and it should be removed from get and get_ref parameters. So if we know a root, we also know if we use flat state or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, I think that root has to become a field of the Trie and it should be removed from get and get_ref parameters.

Yeah, this clear my concern! If we "bind" trie to the specific state root, the API makes perfect sense!

core/store/src/trie/mod.rs Outdated Show resolved Hide resolved
@Longarithm Longarithm requested a review from matklad August 8, 2022 14:32
Comment on lines +39 to +45
#[cfg(feature = "protocol_feature_flat_state")]
return self
.store
.get(DBCol::FlatState, key)
.map_err(|_| StorageError::StorageInternalError);
#[cfg(not(feature = "protocol_feature_flat_state"))]
unreachable!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "protocol_feature_flat_state")]
return self
.store
.get(DBCol::FlatState, key)
.map_err(|_| StorageError::StorageInternalError);
#[cfg(not(feature = "protocol_feature_flat_state"))]
unreachable!();
self
.store
.get(DBCol::FlatState, key)
.map_err(|_| StorageError::StorageInternalError)

As we now gate constructor, there's no need to cfg-gate anything else (but it probably makes sense to add a comment to new pointing out that #[cfg(feature = "protocol_feature_flat_state")] is load-bearing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do it because I cfg-gate DBCol::FlatState as well - I don't want regular nodes to fail because they don't store this column in db

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't fail on regular nodes, b/c regular nodes won't have self because the new is gated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that I gate enum option DBCol::FlatState. If I don't do it, it automatically occur in DBCol::iter and cause issues. It is simpler to gate this option than deal with correct store opening, etc.

core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
@@ -234,6 +261,34 @@ impl ShardTries {
) -> (StoreUpdate, StateRoot) {
self.apply_all_inner(trie_changes, shard_uid, true)
}

#[cfg(feature = "protocol_feature_flat_state")]
pub fn apply_changes_to_flat_state(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be pub(crate) perhaps? or even just private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to do this is to unite it with apply_all, which accepts only TrieChanges so far. Seems doable, but requires more refactoring work

@@ -53,6 +53,7 @@ protocol_feature_chunk_only_producers = [
"near-chain-configs/protocol_feature_chunk_only_producers",
"near-primitives/protocol_feature_chunk_only_producers",
]
protocol_feature_flat_state = ["near-store/protocol_feature_flat_state"]

nightly = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we intentionally not including protocol_feature_flat_state in nightly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not ready for nightly yet.

core/primitives/src/state.rs Show resolved Hide resolved
@near-bulldozer near-bulldozer bot merged commit 1d2a9e3 into master Aug 9, 2022
@near-bulldozer near-bulldozer bot deleted the fs-simple branch August 9, 2022 15:44
nikurt pushed a commit that referenced this pull request Aug 11, 2022
First step towards flat storage: #7327
Continue work here because #7295 was broken.

Here we start building it under protocol feature without adding it to nightly. The goal is to support it for localnet with one node which never sends skip approvals, so later we could extend it to more nodes and eventually implement a migration to mainnet.

Now, everyone should be able to build neard with this feature, set `max_block_production_delay` to something big and run localnet.

I currently couldn't make all tests work with enabled feature, e.g. `test_care_about_shard`, but I open it for review because I don't think it affects the idea.

## Testing

* Check that encoding and decoding value reference gives expected results. `ValueRef` can be reused in other places in code
* Check that flat state is used in regular trie handler and not used in view trie handler
* Check that after block processing, getting value from regular trie and view trie gives the same result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants