Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure TrieStorage and implement correct semantics for state proofs #9350

Merged
merged 4 commits into from
Jul 28, 2023

Conversation

robin-near
Copy link
Contributor

@robin-near robin-near commented Jul 26, 2023

Background

For stateless validation, it must be possible to generate a state proof when applying a chunk, and we must ensure that

  • During generation of the state proof, the execution outcome is identical to if we weren't generating a state proof
  • When reapplying the same chunk using only the state proof (no full trie or flat storage), the outcome must also be identical.

Problem

The tricky part is storage gas accounting.

For background, the trie is stored as two kinds of entries: trie nodes, and values. For trie nodes, the key is the hash of the node and the value is the node. For values, the key is the hash of the whole value, and the value is the value.

When doing a lookup of either a trie node or a value, there can be three cases:

  • If looking up a trie node, the node is returned from flat storage (unless flat storage is disabled for testing)
  • Otherwise, the node or value is looked up first from the chunk cache (deterministic cache that is accumulated during chunk application). If present, it is considered an in-memory access.
  • Otherwise, the node or value is fetched from disk, and that's considered an from-db access. (The actual fetch however may come from another cache called the 'shard cache', which is non-deterministic and best-effort.)

Each of these cases incurs a different storage operation cost. Therefore, when recording and replaying, we must also compute the exact same cost, by correctly determining which case happens for each access.

Previously, this was difficult, because the chunk cache lives in TrieCachingStorage, while the flat storage layer lives in Trie. (As a background, Trie contains TrieStorage, which is a base class that is implemented by TrieCachingStorage, TrieRecordingStorage, and TriePartialMemoryStorage). So when recording or replaying, the TrieStorage used is NOT a TrieCachingStorage, but either TrieRecordingStorage or TrieMemoryStorage. This makes the chunk cache inaccessible, and so the other two storage implementations had to replicate the behavior; however this behavior was impossible to replicate because the flat storage layer is independent:

  • Consider when a lookup happens through flat storage. The Trie looks it up from flat storage and then skips calling the underlying TrieStorage. This is undesirable if we're recording, because even though the result came from flat storage, we must still include it in the recorded state proof. That means we should call the underlying TrieStorage to do a lookup, but then that would increment in-memory and db access counters which would be incorrect.

What this PR does

This PR moves the critical chunk cache logic out of TrieCachingStorage and into a field of Trie, which ensures that no matter what TrieStorage is being used, the chunk cache logic is always the same. Additionally, TrieRecordingStorage is removed and the logic is also moved into Trie as an optional field. This allows Trie to have total control of when to cache and when to record:

  • In the flat storage case, if we're recording, we would additionally look up via the trie (so we can produce the state proof), but this lookup goes straight to the TrieStorage layer, which means we're not counting the cost of such trie accesses. This ensures that the gas computation is the same whether we're recording or not.
  • If we're recording, any access that went to the TrieStorage layer also goes into the recorded state proof. This is done in a helper method so we ensure we don't miss anything.
  • If we're replaying, instead of using TrieCachingStorage, we use TriePartialMemoryStorage. The chunk cache layer remains the same since that's independent. There's one difference, which is that we don't have flat storage when replaying. The trick here is to do trie lookups without counting the cost, just like when recording.

This is still not a perfect design as there's still quite a few places where we penetrate the abstraction layers, but it's better than before. At least we don't need to have a TrieRecordingStorage having a TrieCachingStorage as the backend and having to reimplement chunk caching behavior everywhere.

Testing

Eight randomized tests have been added (see trie_recording.rs) to check that in various scenarios (chunk cache enabled or disabled; looking up existing or missing keys; using flat storage or not), the trie access results (values and counters) are exactly the same between normal trie, recording trie, and replaying trie. This should give us confidence that this implementation is suitable for stateless validation.

Also, I'm running a mainnet node with this code. For a few hundred blocks it's been surviving fine, so the implementation should be consistent with the previous, too.

Nayduck: https://nayduck.near.org/#/run/3150

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, that's a great step forward to improving trie structure and #8741.
Leaving comments, mostly about naming stuff.

/// Note that in general, it is NOT true that all storage access is either a
/// db read or mem read. It can also be a flat storage read, which is not
/// tracked via TrieChunkCache.
pub struct TrieChunkCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline - please consider TrieAccountingCache because it handles both deterministic cache and computing access metrics.

self.enable = enabled;
}

pub fn retrieve_trie_node_with_cache(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrieve_trie_node_with_storage? Maybe worth a little comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the confusion. I'll just simplify the name and add a comment.

/// toggled on the fly), trie nodes that have been looked up once will be
/// guaranteed to be cached, and further reads to these nodes will
/// encounter less gas cost.
chunk_cache: RefCell<TrieChunkCache>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you work on this, it would be really great if you also could do additional renaming and help with #9054. We are considering shard/chunk cache as a poor naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps let's focus on the accounting cache name for now since I'm not touching the shard cache yet. And then I can follow up with another rename.

@@ -321,9 +326,31 @@ impl std::fmt::Debug for TrieNode {
}

pub struct Trie {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated: do you think we can easily achieve #6308 (comment)?

Copy link
Contributor Author

@robin-near robin-near Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it's easy - the problem is that reading from the trie also requires mutating the cache. And we provide the vm runtime with the ability to read from the trie, which obviously cannot carry a &mut in there or else we'd have to propagate a lifetime parameter all the way in which is going to be completely infeasible, so if the vm runtime is going to start with a & but somehow end up calling a &mut then somewhere in the middle we need a RefCell no matter what. And so putting the RefCell on the cache (which is a common thing to do) seems more reasonable.

fn internal_retrieve_trie_node(
&self,
hash: &CryptoHash,
count_cost: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming it with cache_node[|s] here and all the way, up to renaming force_use_flat_storage_accounting with skip_cache_nodes. I think it explains the behaviour better and doesn't require understanding of flat storage from the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "cache" is overloaded - but how about use_accounting_cache, and skip_accounting_cache_for_trie_nodes?

@robin-near robin-near requested a review from Longarithm July 27, 2023 19:53
Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@near-bulldozer near-bulldozer bot merged commit ab7e238 into near:master Jul 28, 2023
nikurt pushed a commit that referenced this pull request Aug 24, 2023
…ofs (#9350)

## Background
For stateless validation, it must be possible to generate a state proof when applying a chunk, and we must ensure that
 - During generation of the state proof, the execution outcome is identical to if we weren't generating a state proof
 - When reapplying the same chunk using only the state proof (no full trie or flat storage), the outcome must also be identical.

## Problem
The tricky part is storage gas accounting. 

For background, the trie is stored as two kinds of entries: trie nodes, and values. For trie nodes, the key is the hash of the node and the value is the node. For values, the key is the hash of the whole value, and the value is the value.

When doing a lookup of either a trie node or a value, there can be three cases:
 - If looking up a trie node, the node is returned from flat storage (unless flat storage is disabled for testing)
 - Otherwise, the node or value is looked up first from the chunk cache (deterministic cache that is accumulated during chunk application). If present, it is considered an in-memory access.
 - Otherwise, the node or value is fetched from disk, and that's considered an from-db access. (The actual fetch however may come from another cache called the 'shard cache', which is non-deterministic and best-effort.)

**Each of these cases incurs a different storage operation cost.** Therefore, when recording and replaying, we must also compute the exact same cost, by correctly determining which case happens for each access.

Previously, this was difficult, because the chunk cache lives in TrieCachingStorage, while the flat storage layer lives in Trie. (As a background, Trie contains TrieStorage, which is a base class that is implemented by TrieCachingStorage, TrieRecordingStorage, and TriePartialMemoryStorage). So when recording or replaying, the TrieStorage used is NOT a TrieCachingStorage, but either TrieRecordingStorage or TrieMemoryStorage. This makes the chunk cache inaccessible, and so the other two storage implementations had to replicate the behavior; however this behavior was impossible to replicate because the flat storage layer is independent:

* Consider when a lookup happens through flat storage. The Trie looks it up from flat storage and then skips calling the underlying TrieStorage. This is undesirable if we're recording, because even though the result came from flat storage, we must still include it in the recorded state proof. That means we should call the underlying TrieStorage to do a lookup, but then that would increment in-memory and db access counters which would be incorrect.

## What this PR does
This PR moves the critical chunk cache logic out of TrieCachingStorage and into a field of Trie, which ensures that no matter what TrieStorage is being used, the chunk cache logic is always the same. Additionally, TrieRecordingStorage is removed and the logic is also moved into Trie as an optional field. This allows Trie to have total control of when to cache and when to record:

* In the flat storage case, if we're recording, we would additionally look up via the trie (so we can produce the state proof), but this lookup goes straight to the TrieStorage layer, which means we're not counting the cost of such trie accesses. This ensures that the gas computation is the same whether we're recording or not.
* If we're recording, any access that went to the TrieStorage layer also goes into the recorded state proof. This is done in a helper method so we ensure we don't miss anything.
* If we're replaying, instead of using TrieCachingStorage, we use TriePartialMemoryStorage. The chunk cache layer remains the same since that's independent. There's one difference, which is that we don't have flat storage when replaying. The trick here is to do trie lookups without counting the cost, just like when recording.

This is still not a perfect design as there's still quite a few places where we penetrate the abstraction layers, but it's better than before. At least we don't need to have a TrieRecordingStorage having a TrieCachingStorage as the backend and having to reimplement chunk caching behavior everywhere.

## Testing
Eight randomized tests have been added (see trie_recording.rs) to check that in various scenarios (chunk cache enabled or disabled; looking up existing or missing keys; using flat storage or not), the trie access results (values and counters) are exactly the same between normal trie, recording trie, and replaying trie. This should give us confidence that this implementation is suitable for stateless validation.

Also, I'm running a mainnet node with this code. For a few hundred blocks it's been surviving fine, so the implementation should be consistent with the previous, too.

Nayduck: https://nayduck.near.org/#/run/3150
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Aug 24, 2023
…ofs (near#9350)

## Background
For stateless validation, it must be possible to generate a state proof when applying a chunk, and we must ensure that
 - During generation of the state proof, the execution outcome is identical to if we weren't generating a state proof
 - When reapplying the same chunk using only the state proof (no full trie or flat storage), the outcome must also be identical.

## Problem
The tricky part is storage gas accounting. 

For background, the trie is stored as two kinds of entries: trie nodes, and values. For trie nodes, the key is the hash of the node and the value is the node. For values, the key is the hash of the whole value, and the value is the value.

When doing a lookup of either a trie node or a value, there can be three cases:
 - If looking up a trie node, the node is returned from flat storage (unless flat storage is disabled for testing)
 - Otherwise, the node or value is looked up first from the chunk cache (deterministic cache that is accumulated during chunk application). If present, it is considered an in-memory access.
 - Otherwise, the node or value is fetched from disk, and that's considered an from-db access. (The actual fetch however may come from another cache called the 'shard cache', which is non-deterministic and best-effort.)

**Each of these cases incurs a different storage operation cost.** Therefore, when recording and replaying, we must also compute the exact same cost, by correctly determining which case happens for each access.

Previously, this was difficult, because the chunk cache lives in TrieCachingStorage, while the flat storage layer lives in Trie. (As a background, Trie contains TrieStorage, which is a base class that is implemented by TrieCachingStorage, TrieRecordingStorage, and TriePartialMemoryStorage). So when recording or replaying, the TrieStorage used is NOT a TrieCachingStorage, but either TrieRecordingStorage or TrieMemoryStorage. This makes the chunk cache inaccessible, and so the other two storage implementations had to replicate the behavior; however this behavior was impossible to replicate because the flat storage layer is independent:

* Consider when a lookup happens through flat storage. The Trie looks it up from flat storage and then skips calling the underlying TrieStorage. This is undesirable if we're recording, because even though the result came from flat storage, we must still include it in the recorded state proof. That means we should call the underlying TrieStorage to do a lookup, but then that would increment in-memory and db access counters which would be incorrect.

## What this PR does
This PR moves the critical chunk cache logic out of TrieCachingStorage and into a field of Trie, which ensures that no matter what TrieStorage is being used, the chunk cache logic is always the same. Additionally, TrieRecordingStorage is removed and the logic is also moved into Trie as an optional field. This allows Trie to have total control of when to cache and when to record:

* In the flat storage case, if we're recording, we would additionally look up via the trie (so we can produce the state proof), but this lookup goes straight to the TrieStorage layer, which means we're not counting the cost of such trie accesses. This ensures that the gas computation is the same whether we're recording or not.
* If we're recording, any access that went to the TrieStorage layer also goes into the recorded state proof. This is done in a helper method so we ensure we don't miss anything.
* If we're replaying, instead of using TrieCachingStorage, we use TriePartialMemoryStorage. The chunk cache layer remains the same since that's independent. There's one difference, which is that we don't have flat storage when replaying. The trick here is to do trie lookups without counting the cost, just like when recording.

This is still not a perfect design as there's still quite a few places where we penetrate the abstraction layers, but it's better than before. At least we don't need to have a TrieRecordingStorage having a TrieCachingStorage as the backend and having to reimplement chunk caching behavior everywhere.

## Testing
Eight randomized tests have been added (see trie_recording.rs) to check that in various scenarios (chunk cache enabled or disabled; looking up existing or missing keys; using flat storage or not), the trie access results (values and counters) are exactly the same between normal trie, recording trie, and replaying trie. This should give us confidence that this implementation is suitable for stateless validation.

Also, I'm running a mainnet node with this code. For a few hundred blocks it's been surviving fine, so the implementation should be consistent with the previous, too.

Nayduck: https://nayduck.near.org/#/run/3150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants