Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resharding V3 - state witness, implementation #577

Merged
merged 3 commits into from
Nov 22, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 66 additions & 25 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,16 @@ supporting smooth transitions without altering storage structures directly.

### Stateless Validation

### State Witness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick preface with very high level description? It feels like you're jumping right into the nitty gritty details.


Resharding state transition becomes one of `implicit_transitions` in `ChunkStateWitness`. It must be validated between processing last chunk (potentially missing) in the old epoch and the first chunk (potentially missing) in the new epoch. `ChunkStateTransition` fields also nicely correspond to the resharding state transition: in `block_hash` we store the hash of the last block of the parent shard, in `base_state` we store the resharding proof, and in `post_state_root` we store the proposed state root.

Note that it leads to **two** state transitions corresponding to the same block hash. On the chunk producer side, the first transition is stored for the `(block_hash, parent_shard_uid)` pair and the second one is stored for the `(block_hash, child_shard_uid)` pair.

The chunk validator has all the blocks, so it identifies whether implicit transition corresponds to applying missing chunk or resharding independently. This is implemented in `get_state_witness_block_range`, which iterates from `state_witness.chunk_header.prev_block_hash()` to the block with includes last last chunk for the (parent) shard, if it exists.

Then, on `validate_chunk_state_witness`, if implicit transition corresponds to resharding, chunk validator calls `retain_split_shard` and proves state transition from parent to child shard.

### State Sync

Changes to the state sync protocol aren't typically conisdered protocol changes requiring a version bump, since it's concerned with downloading state that isn't present locally, rather than with the rules of execution of blocks and chunks. But it might still be helpful to outline some planned changes to state sync intended to make the resharding implementation easier to work with.
Expand All @@ -187,18 +197,39 @@ In this NEP, we propose updating the ShardId semantics to allow for arbitrary id

## Reference Implementation

```text
[This technical section is required for Protocol proposals but optional for other categories. A draft implementation should demonstrate a minimal implementation that assists in understanding or implementing this proposal. Explain the design in sufficient detail that:
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short comment to each block? The pseudo code is nice but I think it would be good to also have something in human words ;)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please also add a subsection heading if necessary?

should_split_shard(block):
shard_layout = epoch_manager.shard_layout(block.epoch_id())
next_shard_layout = epoch_manager.shard_layout(block.next_epoch_id())
return epoch_manager.is_next_block_epoch_start(block) && shard_layout != next_shard_layout

* Its interaction with other features is clear.
* Where possible, include a Minimum Viable Interface subsection expressing the required behavior and types in a target programming language. (ie. traits and structs for rust, interfaces and classes for javascript, function signatures and structs for c, etc.)
* It is reasonably clear how the feature would be implemented.
* Corner cases are dissected by example.
* For protocol changes: A link to a draft PR on nearcore that shows how it can be integrated in the current code. It should at least solve the key technical challenges.
on chain.postprocess_block(block):
next_shard_layout = epoch_manager.shard_layout(block.next_epoch_id())
if should_split_shard(block):
resharding_manager.split_shard(split_shard_event, next_shard_layout)

The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]
on resharding_manager.split_shard(split_shard_event, next_shard_layout):
set State mapping
start FlatState resharding
process MemTrie resharding:
freeze MemTrie, create HybridMemTries
for each child shard:
mem_tries[shard].retain_split_shard(boundary_account)

mem_trie.retain_split_shard(boundary_account):
split shard by path as described above while generating the proof
saving the proof as state transition for pair (block, new_shard_uid)

then, the proof is sent as one of implicit transitions in ChunkStateWitness

then, on chunk validation path, chunk validator understands if resharding is a part of state transition, using the same should_split_shard condition

and then it calls Trie(state_transition_proof).retain_split_shard(boundary_account) which should succeed if proof is sufficient and generates new state root

finally, it checks that the new state root matches the state root proposed in ChunkStateWitness. if the whole ChunkStateWitness is valid, then chunk validator sends endorsement which also endorses the resharding.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to take out these long text lines from the code block, or newline them manually

```


### State Storage - MemTrie

The current implementation of MemTrie uses a pool of memory (`STArena`) to allocate and deallocate nodes and internal pointers in this pool to reference child nodes. MemTries, unlike the State representation of Trie, do not work with the hash of the nodes but internal memory pointers directly. Additionally, MemTries are not thread safe and one MemTrie exists per shard.
Expand Down Expand Up @@ -296,7 +327,7 @@ Elements inherited by both children:

Elements inherited only be the lowest index child:

* `BUFFERED_RECEIPT_INDICES `
* `BUFFERED_RECEIPT_INDICES`
* `BUFFERED_RECEIPT`

#### Bring children shards up to date with the chain's head
Expand Down Expand Up @@ -410,15 +441,30 @@ The state sync algorithm defines a `sync_hash` that is used in many parts of the

## Security Implications

```text
[Explicitly outline any security concerns in relation to the NEP, and potential ways to resolve or mitigate them. At the very least, well-known relevant threats must be covered, e.g. person-in-the-middle, double-spend, XSS, CSRF, etc.]
```
### Fork Handling

In theory, it can happen that there will be more than one candidate block which finishes the last epoch with old shard layout. For previous implementations it didn't matter because resharding decision was made in the beginning previous epoch. Now, the decision is made on the epoch boundary, so the new implementation handles this case as well.

### Proof Validation

With single shard tracking, nodes can't independently validate new state roots after resharding, because they don't have state of shard being split. That's why we generate resharding proofs, whose generation and validation may be a new weak point. However, `retain_split_shard` is equivalent to constant number of lookups in the trie, so its overhead its negligible. Even if proof is invalid, it will only imply that `retain_split_shard` fails early, similarly to other state transitions.

## Alternatives

```text
[Explain any alternative designs that were considered and the rationale for not choosing them. Why your design is superior?]
```
In the solution space which would keep blockchain stateful, we also considered an alternative to handle resharding through mechanism of `Receipts`. The workflow would be to:
* create empty `target_shard`,
* require `source_shard` chunk producers to create special `ReshardingReceipt(source_shard, target_shard, data)` where `data` would be an interval of key-value pairs in `source_shard` alongside with the proof,
* then, `target_shard` trackers and validators would process that receipt, validate the proof and insert the key-value pairs into the new shard.

However, `data` would occupy most of the whole state witness capacity and introduce overhead of proving every single interval in `source_shard`. Moreover, approach to sync target shard "dynamically" also requires some form of catchup, which makes it much less feasible than chosen approach.

Another question is whether we should tie resharding to epoch boundaries. This would allow to come from resharding decision to completion much faster. But for that, we would need to:
* agree if we should reshard in the middle of the epoch or allow "fast epoch completion" which has to be implemented,
* keep chunk producers tracking "spare shards" ready to receive items from split shards,
* on resharding event, implement specific form of state sync, on which source and target chunk producers would agree on new state roots offline,
* then, new state roots would be validated by chunk validators in the same fashion.

While it is much closer to Dynamic Resharding (below), it requires much more changes to the protocol. And the considered idea works very well as intermediate step to that, if needed.

## Future possibilities

Expand All @@ -428,27 +474,22 @@ The state sync algorithm defines a `sync_hash` that is used in many parts of the

## Consequences

```text
[This section describes the consequences, after applying the decision. All consequences should be summarized here, not just the "positive" ones. Record any concerns raised throughout the NEP discussion.]
```

### Positive

* p1
* The protocol is able to execute resharding even while only a fraction of nodes track the split shard.
* New resharding can happen in the matter of minutes instead of hours.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say the resharding happens "instantaneously" from the point of view of NEAR users?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, is this section supposed to be filled by us or the NEP reviewers? I've seen it empty for a few NEPs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it happened instantaneously before as well. It's just that we need less "background compute power" now. I clarified it.

For NEP reviewers we have "Benefits" and "Concerns" sections below. It seems very reasonable to provide our initial view at pros and cons before review.


### Neutral

* n1
N/A

### Negative

* n1
* The storage components need to handle additional complexity of controlling the shard layout change.

### Backwards Compatibility

```text
[All NEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. Author must explain a proposes to deal with these incompatibilities. Submissions without a sufficient backwards compatibility treatise may be rejected outright.]
```
Approach is fully backwards compatible, just adding new protocol upgrade on top of existing implementation. Also, we were able to completely remove previous resharding logic, as it was already approved by validators, and to process chunks from any layout, it is enough to take state from that layout from archival node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We retained the capability to replay reshardingV2 in archive nodes, not sure if it's worth mentioning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just deleted mention of this, as I don't have enough understanding.


## Unresolved Issues (Optional)

Expand Down
Loading