diff --git a/neps/nep-0509.md b/neps/nep-0509.md index 735332c8e..a2f29ed26 100644 --- a/neps/nep-0509.md +++ b/neps/nep-0509.md @@ -33,6 +33,7 @@ As a result, the team sought alternative approaches and concluded that stateless ### Assumptions +* Not more than 1/3 of validators is corrupted. * In memory trie is enabled - [REF](https://docs.google.com/document/d/1_X2z6CZbIsL68PiFvyrasjRdvKA_uucyIaDURziiH2U/edit?usp=sharing) * State sync is enabled (so that nodes can track different shards across epochs) * Merkle Patricia Trie continues to be the state trie implementation @@ -48,52 +49,34 @@ As a result, the team sought alternative approaches and concluded that stateless * The cost of additional network and compute should be acceptable. * Validator rewards should not be reduced. -### Out of scope - -* Resharding support. -* Data size optimizations such as compression, for both chunk data and state witnesses, except basic optimizations that are practically necessary. -* Separation of consensus and execution, where consensus runs independently from execution, and validators asynchronously perform state transitions after the transactions are proposed on the consensus layer, for the purpose of amortizing the computation and network transfer time. -* ZK integration. -* Underlying data structure change (e.g. verkle tree). - +### Current design -## High level flow - -The current high-level chunk production flow, if we drop details and edge cases, is as follows: -* Block producer at height H BP(H) produces block B(H) with chunks accessible to it and distributes it. -* Chunk producer for shard S at height H+1 CP(S, H+1) produces chunk C(S, H+1) based on B(H) and distributes it. -* BP(H+1) collects all chunks at height H+1 until certain timeout is reached. -* BP(H+1) produces block B(H+1) with chunks C(*, H+1) accessible to it and distributes it, etc. +The current high-level chunk production flow, excluding details and edge cases, is as follows: +* Block producer at height `H` `BP(H)` produces block `B(H)` with chunks accessible to it and distributes it. +* Chunk producer for shard `S` at height `H+1` `CP(S, H+1)` produces chunk `C(S, H+1)` based on `B(H)` and distributes it. +* `BP(H+1)` collects all chunks at height `H+1` until certain timeout is reached. +* `BP(H+1)` produces block `B(H+1)` with chunks `C(*, H+1)` accessible to it and distributes it, etc. The "induction base" is at genesis height, where genesis block with default chunks is accessible to everyone, so chunk producers can start right away from genesis height + 1. -One can observe that there is no "chunk validation" step here. -To simplify explanation how this happens right now, let's say that certain validator considers chunk C(S, H+1) valid iff **post state root** it computed by executing C(S, H) is the same as **pre state root** proposed in `ChunkHeader` `prev_state_root` field of C(S, H+1). -BP(H+1), in fact, includes all received chunks in B(H+1), even invalid ones. But their rejection still will happen because currently **block producers are required to track all shards**, which implies that they execute all the chunks. -So, each block producer locally has **post state roots** for all C(S, H) and can check validity of every chunk in B(H+1). -If some C(S, H+1) is invalid, the whole B(H+1) is ignored. - -As we can see, requirement for block producers to track all shards is **crucial** for the current design. -To achieve phase 2 of sharding, we want to drop it. To achieve that, we introduce new role of a **chunk validator** and propose the following changes to the flow: - -* Chunk producer, in addition to producing a chunk, produces new `ChunkStateWitness` message. - * The `ChunkStateWitness` contains data which is enough to prove validity of the chunk's header what is being produced: - * As it is today, all fields of the `ShardChunkHeaderInnerV3`, except `tx_root`, are uniquely determined by the blockchain's history based on where the chunk is located (i.e. its parent block and shard ID). - * The `tx_root` is based on the list of transactions proposed, which is at the discretion of the chunk producer. However, these transactions must be valid (i.e. the sender accounts have enough balance and the correct nonce, etc.). - * This `ChunkStateWitness` proves to anyone, including those who track only block data and no shards, that this chunk header is correct, meaning that the uniquely determined fields are exactly what should be expected, and the discretionary `tx_root` field corresponds to a valid set of transactions. - * The `ChunkStateWitness` is not part of the chunk itself; it is distributed separately and is considered transient data. -* The chunk producer distributes the `ChunkStateWitness` to a subset of **chunk validators** assigned for this shard. This is in addition to, and independent of, the existing chunk distribution logic (implemented by `ShardsManager`) today. +One can observe that there is no "chunk validation" step here. In fact, validity of chunks is implicitly guaranteed by **requirement for all block producers to track all shards**. +To achieve phase 2 of sharding, we want to drop this requirement. For that, we propose the following changes to the flow: + +### New design + +* Chunk producer, in addition to producing a chunk, produces new `ChunkStateWitness` message. The `ChunkStateWitness` contains data which is enough to prove validity of the chunk's header what is being produced. + * `ChunkStateWitness` proves to anyone, including those who track only block data and no shards, that this chunk header is correct. + * `ChunkStateWitness` is not part of the chunk itself; it is distributed separately and is considered transient data. +* The chunk producer distributes the `ChunkStateWitness` to a subset of **chunk validators** which are assigned for this shard. This is in addition to, and independent of, the existing chunk distribution logic (implemented by `ShardsManager`) today. * Chunk Validator selection and assignment are described below. * A chunk validator, upon receiving a `ChunkStateWitness`, validates the state witness and determines if the chunk header is indeed correctly produced. If so, it sends a `ChunkEndorsement` to the current block producer. - * A `ChunkEndorsement` contains the chunk hash along with a signature proving the endorsement by the chunk validator. It implicitly carries a weight equal to the amount of the chunk validator's stake that is assigned to this shard for this block. (See Chunk Validator Shuffling). * As the existing logic is today, the block producer for this block waits until either all chunks are ready, or a timeout occurs, and then proposes a block containing whatever chunks are ready. Now, the notion of readiness here is expanded to also having more than 2/3 of chunk endorsements by weight. * This means that if a chunk does not receive enough chunk endorsements by the timeout, it will not be included in the block. In other words, the block only contains chunks for which there is already a consensus of validity. **This is the key reason why we will no longer need fraud proofs / tracking all shards**. - * The 2/3 fraction has the denominator being the total stake assigned to validate this shard, *not* the total stake of all validators. See Chunk Validator Shuffling. + * The 2/3 fraction has the denominator being the total stake assigned to validate this shard, *not* the total stake of all validators. * The block producer, when producing the block, additionally includes the chunk endorsements (at least 2/3 needed for each chunk) in the block's body. The validity of the block is expanded to also having valid 2/3 chunk endorsements for each chunk included in the block. - * This necessitates a new block format. * If a block fails validation because of not having the required chunk endorsements, it is considered a block validation failure for the purpose of Doomslug consensus, just like any other block validation failure. In other words, nodes will not apply the block on top of their blockchain, and (block) validators will not endorse the block. -Let's formalise a proposed change to the validator roles and responsibilities, with same and new behavior clearly labelled: +So the high-level specification can be described as the list of changes in the validator roles and responsibilities: * Block producers: * (Same as today) Produce blocks, (new) including waiting for chunk endorsements @@ -117,6 +100,14 @@ Let's formalise a proposed change to the validator roles and responsibilities, w See the Validator Structure Change section below for more details. +### Out of scope + +* Resharding support. +* Data size optimizations such as compression, for both chunk data and state witnesses, except basic optimizations that are practically necessary. +* Separation of consensus and execution, where consensus runs independently from execution, and validators asynchronously perform state transitions after the transactions are proposed on the consensus layer, for the purpose of amortizing the computation and network transfer time. +* ZK integration. +* Underlying data structure change (e.g. verkle tree). + ## Validator Structure Change ### Roles @@ -372,7 +363,7 @@ Based on target number of mandates and total chunk validators stake, [here](http All the mandates are stored in new version of `EpochInfo` `EpochInfoV4` in [validator_mandates](https://github.com/near/nearcore/blob/164b7a367623eb651914eeaf1cbf3579c107c22d/core/primitives/src/epoch_manager.rs#L775) field. After that, for each height in the epoch, [EpochInfo::sample_chunk_validators](https://github.com/near/nearcore/blob/164b7a367623eb651914eeaf1cbf3579c107c22d/core/primitives/src/epoch_manager.rs#L1224) is called to return `ChunkValidatorStakeAssignment`. It is `Vec>` where s-th element corresponds to s-th shard in the epoch, contains ids of all chunk validator for that height and shard, alongside with its total mandate stake assigned to that shard. -`sample_chunk_validators` basically just shuffles `validator_mandates` among shards using height-specific seed. +`sample_chunk_validators` basically just shuffles `validator_mandates` among shards using height-specific seed. If there are no more than 1/3 malicious validators, then by Chernoff bound the probability that at least one shard is corrupted is small enough. **This is a reason why we can split validators among shards and still rely on basic consensus assumption**. This way, everyone tracking block headers can compute chunk validator assignment for each height and shard.