tracking: SMT in trace decoder #275

0xaatif · 2024-06-12T10:58:32Z

Motivation

Support Hermez SMT format in protocol decoder. #93

Previous work

Background

Terminology

Type 1 🥇 is full compatibility ¹ (there should be some more words here, but I don't know what to write)
Type 2 🥈 is almost-compatibility ¹
A witness is a fact about the state of the ethereum world that a prover wants to create a proof for
MPT and SMT are two datastructures used by Type 1 🥇 and Type 2 🥈 respectively.

Story

zero-bin makes RPC calls to an Ethereum node.
It requests special traces for a block,
and assembles them into some trace_decoder structures.
zero-bin asks trace_decoder to assemble some evm_arithmetization structures.
zero-bin makes
several
calls
to proof_gen and evm_arithmetization to actually produce a result

Current pieces of the puzzle

Pipeline	Type 1 🥇	Type 2 🥈
Ethereum node	`0xPolygonZero/erigon @ feat/zero`	`0xPolygonHermez/cdk-erigon @ zkevm`
`trace_decoder`	`0xPolygonZero/zk_evm/trace_decoder @ develop`	(unimplemented)
`zero-bin`	`0xPolygonZero/zero-bin @ develop`	??
`evm_arithmetization`	`0xPolygonZero/zk_evm/evm_arithmetization @ develop`	`0xPolygonZero/zk_evm/evm_arithmetization @ feat/type2`*
`proof_gen`	`0xPolygonZero/zk_evm/proof_gen @ develop`	`0xPolygonZero/zk_evm/proof_gen @ feat/type2`*

* marks a non-default branch

How `trace_decoder` works

Work

This issue tracks filling in the trace_decoder for Type 2 🥈.

Plan

refactor: frontend of trace_decoder #309 also adds a frontend for the Type 2 🥈 wire format.
refactor the Type 1 🥇 backend and add Type 2 🥈 support.

https://vitalik.eth.limo/general/2022/08/04/zkevm.html ↩ ↩²

The text was updated successfully, but these errors were encountered:

0xaatif · 2024-07-05T13:45:42Z

The backend does a bunch of tree reshaping, editing account balances etc, before spitting out some evm_arithmetization::GenerationInputs

There are two key challenges to face as we add Type 2 🥈 support

GenerationInputs::tries uses the (MPT/Type 1 🥇) format:

zk_evm/evm_arithmetization/src/generation/mod.rs

Lines 82 to 102 in 5315442

    
           pub struct TrieInputs { 
        
               /// A partial version of the state trie prior to these transactions. It 
        
               /// should include all nodes that will be accessed by these 
        
               /// transactions. 
        
               pub state_trie: HashedPartialTrie, 
        
               /// A partial version of the transaction trie prior to these transactions. 
        
               /// It should include all nodes that will be accessed by these 
        
               /// transactions. 
        
               pub transactions_trie: HashedPartialTrie, 
        
               /// A partial version of the receipt trie prior to these transactions. It 
        
               /// should include all nodes that will be accessed by these 
        
               /// transactions. 
        
               pub receipts_trie: HashedPartialTrie, 
        
               /// A partial version of each storage trie prior to these transactions. It 
        
               /// should include all storage tries, and nodes therein, that will be 
        
               /// accessed by these transactions. 
        
               pub storage_tries: Vec<(H256, HashedPartialTrie)>, 
        
           }

I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization

The data structure used for the tree reshaping. Let's discuss that more below.

The key data structure is this:

zk_evm/trace_decoder/src/decoding.rs

Lines 203 to 208 in 5315442

    
           struct PartialTrieState { 
        
               state: HashedPartialTrie, 
        
               storage: HashMap<H256, HashedPartialTrie>, 
        
               txn: HashedPartialTrie, 
        
               receipt: HashedPartialTrie, 
        
           }

All those HashedPartialTries (MPTs) are going to have to be replaced if we want this to work.
Here's a sketch of what the Type 2 🥈-compatible backend could look like.

Here are the API's for the two data structures AIUI:

zk_evm/trace_decoder/src/hermez_cdk_erigon.rs

Lines 45 to 88 in 5315442

    
           use mpt_trie::{ 
        
               nibbles::Nibbles, 
        
               partial_trie::{HashedPartialTrie, PartialTrie as _}, 
        
           }; 
        
           fn mpt_api( 
        
               mut it: HashedPartialTrie, 
        
               // this is a bitvec of length <= 260 (based off the comment on NibblesIntern) 
        
               key: Nibbles, 
        
               val: &[u8], 
        
               hash: ethereum_types::U256, 
        
           ) { 
        
               let () = it.insert(key, hash).unwrap(); // set hash 
        
               let () = it.insert(key, val).unwrap(); // set val 
        
               let _: Option<&[u8]> = it.get(key); 
        
               let _: ethereum_types::H256 = it.hash(); 
        
           } 
        
           use plonky2::field::goldilocks_field::GoldilocksField; 
        
           use smt_trie::smt::{HashOut, Key}; 
        
           type SmtTrie = smt_trie::smt::Smt<smt_trie::db::MemoryDb>; 
        
           fn smt_api( 
        
               mut it: SmtTrie, 
        
               // this is basically U256 
        
               set_key @ Key( 
        
                   [GoldilocksField(k1), GoldilocksField(k2), GoldilocksField(k3), GoldilocksField(k4)], 
        
               ): smt_trie::smt::Key, 
        
               set_val: ethereum_types::U256, 
        
               // this is a bitvec of length <= 256 
        
               set_hash_key: smt_trie::bits::Bits, 
        
               // this is basically U256 
        
               set_hash_val @ HashOut { 
        
                   elements: 
        
                       [GoldilocksField(h1), GoldilocksField(h2), GoldilocksField(h3), GoldilocksField(h4)], 
        
               }: smt_trie::smt::HashOut, 
        
           ) { 
        
               let () = it.set_hash(set_hash_key, set_hash_val); // set hash 
        
               let () = it.set(set_key, set_val); // set val 
        
               let _: ethereum_types::U256 = it.get(set_key); // 0 on empty 
        
               let _: smt_trie::smt::HashOut = it.root; 
        
           }

If you squint, they're very similar.

Important

the rest of this document hinges on them being compatible - I need to know now if they're not in some way I'm missing

I think there are two viable approaches for refactoring:

Wrap them both, so that the backend doesn't know which its interacting with
```
enum Wrap {
    Mpt(HashedPartialTrie)
    Smt(Smt)
}
```
Find some semantic representation, flushing out a hash using the required format when needed.

I think 2 is going to be more maintainable in the long term.
The current code already has a bunch of hidden invariants that make it difficult to read. Sometimes it unwraps, sometimes it's fallible.
Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc,
and the waters will likely only get muddier with 1.

With 2, we can make illegal states unrepresentable, and I hope the make the backend easier to follow.
Let's explore how the members of PartialTrieState are used in the current backend.
I've prioritised exploring the writes.

`state`

Writes:

We write some AccountRlp RLP a couple of times - this is a great fit.

zk_evm/trace_decoder/src/decoding.rs

Line 424 in 5315442

.insert(val_k, updated_account_bytes.to_vec())

zk_evm/trace_decoder/src/decoding.rs

Line 597 in 5315442

.insert(h_addr_nibs, rlp::encode(&acc_data).to_vec())

This function is a bit more challenging, and is called a couple of times - it implies that plonky2 has deeper knowledge of the tries. Presumably in the SemanticTrie -> MPT conversion we can ensure we uphold the invariants required.

zk_evm/trace_decoder/src/decoding.rs

Lines 460 to 474 in 5315442

    
           /// If a branch collapse occurred after a delete, then we must ensure that 
        
           /// the other single child that remains also is not hashed when passed into 
        
           /// plonky2. Returns the key to the remaining child if a collapse occurred. 
        
           fn delete_node_and_report_remaining_key_if_branch_collapsed( 
        
               trie: &mut HashedPartialTrie, 
        
               delete_k: &Nibbles, 
        
           ) -> TrieOpResult<Option<Nibbles>> { 
        
               let old_trace = get_trie_trace(trie, delete_k); 
        
               trie.delete(*delete_k)?; 
        
               let new_trace = get_trie_trace(trie, delete_k); 
        
               Ok(node_deletion_resulted_in_a_branch_collapse( 
        
                   &old_trace, &new_trace, 
        
               )) 
        
           }

Since the mpt "values" are only ever AccountRlp, I think a semantically equivalent version of this member looks like this:

zk_evm/trace_decoder/src/hermez_cdk_erigon.rs

Lines 21 to 42 in 5315442

    
           pub struct SemanticTrie<S> { 
        
               pub hash2hash: HashMap<U256, U256, S>, 
        
               pub address2account_info: HashMap<Address, AccountInfo, S>, 
        
               // or should this be hash(address)? ~~^ 
        
           } 
        
           impl<S> SemanticTrie<S> { 
        
               pub fn root_as_mpt(&self) -> U256 { 
        
                   todo!() 
        
               } 
        
               pub fn root_as_smt(&self) -> U256 { 
        
                   todo!() 
        
               } 
        
           } 
        
           #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] 
        
           pub struct AccountInfo { 
        
               pub balance: Option<ethereum_types::U256>, 
        
               pub nonce: Option<ethereum_types::U256>, 
        
               pub code_hash: Option<ethereum_types::H256>, 
        
               pub storage_root: Option<ethereum_types::H256>, 
        
           }

`storage`

I think this is the most challenging
I think it ultimately is a mapping from address (or hash(address)?) to AccountRlp:

zk_evm/trace_decoder/src/lib.rs

Lines 382 to 391 in 5315442

    
           let accounts = state 
        
               .items() 
        
               .filter_map(|(address, leaf)| { 
        
                   Some( 
        
                       rlp::decode::<AccountRlp>(leaf.as_val()?) 
        
                           .context("expected `state` trie value leaves to consist only of AccountRlp") 
        
                           .map(|account| (H256::from(address), account)), 
        
                   ) 
        
               }) 
        
               .collect::<Result<_, _>>()?;

, but the write logic is a bit convoluted.

`txn`

Is never written

`receipt`

Is only written in one location:

zk_evm/trace_decoder/src/decoding.rs

Lines 296 to 298 in 5315442

    
           trie_state 
        
               .receipt 
        
               .insert(txn_k, meta.receipt_node_bytes.as_ref())

Nashtare · 2024-07-09T15:17:10Z

Apologies for not looking at this sooner. A few comments that I hope will clarify some items

There are two key challenges to face as we add Type 2 🥈 support

GenerationInputs::tries uses the (MPT/Type 1 🥇) format:
I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization

I'm not sure I understand how this can be considered out of scope, as the trace_decoder must pass "understandable" inputs to the proving backend (in evm_arithmetization) which in the case of the SMT/ Type 2 🥈 format would require a change in this GenerationInputs::tries type. Note that the feat/type2 branch currently supports SMT/ Type 2 🥈 specific format on the proving backend side. The goal of #20 is to remove the need of a distinct feat/type2 branch, and allow upper layers (trace_decoder / zero-bin) to rely on conditional feature flag to target Type 1 / Type 2 backend prover. This conditional feature flagging was discussed internally and considered best for future-proofing and development / maintenance of proving backend with the two distinct statements.

The data structure used for the tree reshaping. Let's discuss that more below.

The key data structure is this:

zk_evm/trace_decoder/src/decoding.rs

Lines 203 to 208 in 5315442

struct PartialTrieState {

state: HashedPartialTrie,

storage: HashMap<H256, HashedPartialTrie>,

txn: HashedPartialTrie,

receipt: HashedPartialTrie,

}

All those HashedPartialTries (MPTs) are going to have to be replaced if we want this to work.

Minor clarification, but the last statement is not accurate. Transaction and Receipt tries do not change, and are still HashedPartialTrie. The state would be replaced by a state_smt field containing serialized SMT (i.e. Vec<U256>), see this code. When forming the final TrieInputs from this PartialTrieState, note that the storage field is removed.

That being said, I agree with you that "2. Find some semantic representation, flushing out a hash using the required format when needed." seems the way to go.

To clarify on your comment "it implies that plonky2 has deeper knowledge of the tries." on delete_node_and_report_remaining_key_if_branch_collapsed: we need the prover to have access to all accounts / storage slots it may try accessing. Some edge cases, like an SSTORE inducing an OOG error would yield unprovable txns if we were to parse "trivially" the clients' payloads, as these usually trim off these info (operation not executed fully -> no need for account data in witness => missing account from trie on the prover side). Same thing happens when deleting a branch child, which results in a collapse, which impacts native tracers, see #237 for more info.

txn is never written

Yes it is, just above the trie_state.receipt update, from the provided TxnMetaState in update_txn_and_receipt_tries().

BGluth · 2024-07-29T18:36:21Z

Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc, and the waters will likely only get muddier with 1.

Yeah, I found having redundant type aliases for U256/H256 just to specify if a type is an unhashed/hashed version of something (eg. Address/HashedAddress) helped a good amount with this.

Find some semantic representation, flushing out a hash using the required format when needed.

Hey could you elaborate a bit more on what this might look like? Are you thinking about using a trait to abstract the common operations away (eg. insert, hash, etc.) between the two trie types? This is what my original attempt at adding smt support was attempting to do, as the delta application logic is nearly identical between mpt & smt tries (with the only difference being the trie operations changed and a few steps for setting up mpt tries became irrelevant).

0xaatif · 2024-10-22T14:32:03Z

MVP for this was shipped in #732

github-project-automation bot added this to Zero EVM Jun 12, 2024

github-project-automation bot moved this to Backlog in Zero EVM Jun 12, 2024

0xaatif changed the title ~~wip: tracking: SMT in trace decoder~~ tracking: SMT in trace decoder Jun 12, 2024

0xaatif self-assigned this Jun 12, 2024

Nashtare added the crate: trace_decoder Anything related to the trace_decoder crate. label Jun 14, 2024

Nashtare added this to the Type 2 - Q2 2024 milestone Jun 14, 2024

This was referenced Jun 19, 2024

fix: only executables should choose a global allocator #301

Merged

refactor: frontend of trace_decoder #309

Merged

Nashtare modified the milestone: Type 2 - Q2 2024 Jun 27, 2024

BGluth mentioned this issue Jul 12, 2024

Add support for SMT & continuations for trace_decoder #198

Closed

4 tasks

This was referenced Jul 12, 2024

Refactor SMT, MPT decoder #166

Closed

refactor: use typed tries in trace_decoder #393

Merged

refactor: simplify mpt_trie's API #400

Closed

This was referenced Aug 5, 2024

fix clippy issues in develop #460

Closed

refactor: trace_decoder::decoding #469

Merged

0xaatif mentioned this issue Aug 27, 2024

refactor: trait StateTrie #542

Merged

0xaatif closed this as completed Oct 22, 2024

github-project-automation bot moved this from Backlog to Done in Zero EVM Oct 22, 2024

0xaatif mentioned this issue Oct 30, 2024

[wip] [tracking] Using reth and revm in our stack #761

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracking: SMT in trace decoder #275

tracking: SMT in trace decoder #275

0xaatif commented Jun 12, 2024 •

edited by einar-polygon

Loading

0xaatif commented Jul 5, 2024 •

edited

Loading

Nashtare commented Jul 9, 2024 •

edited

Loading

BGluth commented Jul 29, 2024

0xaatif commented Oct 22, 2024

tracking: SMT in trace decoder #275

tracking: SMT in trace decoder #275

Comments

0xaatif commented Jun 12, 2024 • edited by einar-polygon Loading

Motivation

Previous work

Background

Terminology

Story

Current pieces of the puzzle

How trace_decoder works

Work

Plan

Footnotes

0xaatif commented Jul 5, 2024 • edited Loading

state

storage

txn

receipt

Nashtare commented Jul 9, 2024 • edited Loading

BGluth commented Jul 29, 2024

0xaatif commented Oct 22, 2024

0xaatif commented Jun 12, 2024 •

edited by einar-polygon

Loading

How `trace_decoder` works

0xaatif commented Jul 5, 2024 •

edited

Loading

`state`

`storage`

`txn`

`receipt`

Nashtare commented Jul 9, 2024 •

edited

Loading