Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking: SMT in trace decoder #275

Closed
1 task
0xaatif opened this issue Jun 12, 2024 · 4 comments
Closed
1 task

tracking: SMT in trace decoder #275

0xaatif opened this issue Jun 12, 2024 · 4 comments
Assignees
Labels
crate: trace_decoder Anything related to the trace_decoder crate.

Comments

@0xaatif
Copy link
Contributor

0xaatif commented Jun 12, 2024

Motivation

Previous work

Background

Terminology

  • Type 1 🥇 is full compatibility 1 (there should be some more words here, but I don't know what to write)
  • Type 2 🥈 is almost-compatibility 1
  • A witness is a fact about the state of the ethereum world that a prover wants to create a proof for
  • MPT and SMT are two datastructures used by Type 1 🥇 and Type 2 🥈 respectively.

Story

  1. zero-bin makes RPC calls to an Ethereum node.
    It requests special traces for a block,
    and assembles them into some trace_decoder structures.
  2. zero-bin asks trace_decoder to assemble some evm_arithmetization structures.
  3. zero-bin makes
    several
    calls
    to proof_gen and evm_arithmetization to actually produce a result

Current pieces of the puzzle

Pipeline Type 1 🥇 Type 2 🥈
Ethereum node 0xPolygonZero/erigon @ feat/zero 0xPolygonHermez/cdk-erigon @ zkevm
trace_decoder 0xPolygonZero/zk_evm/trace_decoder @ develop (unimplemented)
zero-bin 0xPolygonZero/zero-bin @ develop ??
evm_arithmetization 0xPolygonZero/zk_evm/evm_arithmetization @ develop 0xPolygonZero/zk_evm/evm_arithmetization @ feat/type2*
proof_gen 0xPolygonZero/zk_evm/proof_gen @ develop 0xPolygonZero/zk_evm/proof_gen @ feat/type2*

* marks a non-default branch

How trace_decoder works

  1. Parse a binary format of instructions
  2. Execute those instructions into a trie structure
  3. Reshape that trie into an mpt_trie
  4. "decode" that trie into the format expected by evm_arithmetization

Work

This issue tracks filling in the trace_decoder for Type 2 🥈.

Plan

Footnotes

  1. https://vitalik.eth.limo/general/2022/08/04/zkevm.html 2

@github-project-automation github-project-automation bot moved this to Backlog in Zero EVM Jun 12, 2024
@0xaatif 0xaatif changed the title wip: tracking: SMT in trace decoder tracking: SMT in trace decoder Jun 12, 2024
@0xaatif 0xaatif self-assigned this Jun 12, 2024
@Nashtare Nashtare added the crate: trace_decoder Anything related to the trace_decoder crate. label Jun 14, 2024
@Nashtare Nashtare added this to the Type 2 - Q2 2024 milestone Jun 14, 2024
@Nashtare Nashtare modified the milestone: Type 2 - Q2 2024 Jun 27, 2024
@0xaatif
Copy link
Contributor Author

0xaatif commented Jul 5, 2024

The backend does a bunch of tree reshaping, editing account balances etc, before spitting out some evm_arithmetization::GenerationInputs

There are two key challenges to face as we add Type 2 🥈 support

  1. GenerationInputs::tries uses the (MPT/Type 1 🥇) format:
    pub struct TrieInputs {
    /// A partial version of the state trie prior to these transactions. It
    /// should include all nodes that will be accessed by these
    /// transactions.
    pub state_trie: HashedPartialTrie,
    /// A partial version of the transaction trie prior to these transactions.
    /// It should include all nodes that will be accessed by these
    /// transactions.
    pub transactions_trie: HashedPartialTrie,
    /// A partial version of the receipt trie prior to these transactions. It
    /// should include all nodes that will be accessed by these
    /// transactions.
    pub receipts_trie: HashedPartialTrie,
    /// A partial version of each storage trie prior to these transactions. It
    /// should include all storage tries, and nodes therein, that will be
    /// accessed by these transactions.
    pub storage_tries: Vec<(H256, HashedPartialTrie)>,
    }

    I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization
  2. The data structure used for the tree reshaping. Let's discuss that more below.

The key data structure is this:

struct PartialTrieState {
state: HashedPartialTrie,
storage: HashMap<H256, HashedPartialTrie>,
txn: HashedPartialTrie,
receipt: HashedPartialTrie,
}

All those HashedPartialTries (MPTs) are going to have to be replaced if we want this to work.
Here's a sketch of what the Type 2 🥈-compatible backend could look like.

Here are the API's for the two data structures AIUI:

use mpt_trie::{
nibbles::Nibbles,
partial_trie::{HashedPartialTrie, PartialTrie as _},
};
fn mpt_api(
mut it: HashedPartialTrie,
// this is a bitvec of length <= 260 (based off the comment on NibblesIntern)
key: Nibbles,
val: &[u8],
hash: ethereum_types::U256,
) {
let () = it.insert(key, hash).unwrap(); // set hash
let () = it.insert(key, val).unwrap(); // set val
let _: Option<&[u8]> = it.get(key);
let _: ethereum_types::H256 = it.hash();
}
use plonky2::field::goldilocks_field::GoldilocksField;
use smt_trie::smt::{HashOut, Key};
type SmtTrie = smt_trie::smt::Smt<smt_trie::db::MemoryDb>;
fn smt_api(
mut it: SmtTrie,
// this is basically U256
set_key @ Key(
[GoldilocksField(k1), GoldilocksField(k2), GoldilocksField(k3), GoldilocksField(k4)],
): smt_trie::smt::Key,
set_val: ethereum_types::U256,
// this is a bitvec of length <= 256
set_hash_key: smt_trie::bits::Bits,
// this is basically U256
set_hash_val @ HashOut {
elements:
[GoldilocksField(h1), GoldilocksField(h2), GoldilocksField(h3), GoldilocksField(h4)],
}: smt_trie::smt::HashOut,
) {
let () = it.set_hash(set_hash_key, set_hash_val); // set hash
let () = it.set(set_key, set_val); // set val
let _: ethereum_types::U256 = it.get(set_key); // 0 on empty
let _: smt_trie::smt::HashOut = it.root;
}

If you squint, they're very similar.

Important

the rest of this document hinges on them being compatible - I need to know now if they're not in some way I'm missing

I think there are two viable approaches for refactoring:

  1. Wrap them both, so that the backend doesn't know which its interacting with
    enum Wrap {
        Mpt(HashedPartialTrie)
        Smt(Smt)
    }
  2. Find some semantic representation, flushing out a hash using the required format when needed.

I think 2 is going to be more maintainable in the long term.
The current code already has a bunch of hidden invariants that make it difficult to read. Sometimes it unwraps, sometimes it's fallible.
Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc,
and the waters will likely only get muddier with 1.

With 2, we can make illegal states unrepresentable, and I hope the make the backend easier to follow.
Let's explore how the members of PartialTrieState are used in the current backend.
I've prioritised exploring the writes.

state

Writes:

  • We write some AccountRlp RLP a couple of times - this is a great fit.
    .insert(val_k, updated_account_bytes.to_vec())
    .insert(h_addr_nibs, rlp::encode(&acc_data).to_vec())
  • This function is a bit more challenging, and is called a couple of times - it implies that plonky2 has deeper knowledge of the tries. Presumably in the SemanticTrie -> MPT conversion we can ensure we uphold the invariants required.
    /// If a branch collapse occurred after a delete, then we must ensure that
    /// the other single child that remains also is not hashed when passed into
    /// plonky2. Returns the key to the remaining child if a collapse occurred.
    fn delete_node_and_report_remaining_key_if_branch_collapsed(
    trie: &mut HashedPartialTrie,
    delete_k: &Nibbles,
    ) -> TrieOpResult<Option<Nibbles>> {
    let old_trace = get_trie_trace(trie, delete_k);
    trie.delete(*delete_k)?;
    let new_trace = get_trie_trace(trie, delete_k);
    Ok(node_deletion_resulted_in_a_branch_collapse(
    &old_trace, &new_trace,
    ))
    }

Since the mpt "values" are only ever AccountRlp, I think a semantically equivalent version of this member looks like this:

pub struct SemanticTrie<S> {
pub hash2hash: HashMap<U256, U256, S>,
pub address2account_info: HashMap<Address, AccountInfo, S>,
// or should this be hash(address)? ~~^
}
impl<S> SemanticTrie<S> {
pub fn root_as_mpt(&self) -> U256 {
todo!()
}
pub fn root_as_smt(&self) -> U256 {
todo!()
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)]
pub struct AccountInfo {
pub balance: Option<ethereum_types::U256>,
pub nonce: Option<ethereum_types::U256>,
pub code_hash: Option<ethereum_types::H256>,
pub storage_root: Option<ethereum_types::H256>,
}

storage

I think this is the most challenging
I think it ultimately is a mapping from address (or hash(address)?) to AccountRlp:

let accounts = state
.items()
.filter_map(|(address, leaf)| {
Some(
rlp::decode::<AccountRlp>(leaf.as_val()?)
.context("expected `state` trie value leaves to consist only of AccountRlp")
.map(|account| (H256::from(address), account)),
)
})
.collect::<Result<_, _>>()?;
, but the write logic is a bit convoluted.

txn

Is never written

receipt

Is only written in one location:

trie_state
.receipt
.insert(txn_k, meta.receipt_node_bytes.as_ref())

@Nashtare
Copy link
Collaborator

Nashtare commented Jul 9, 2024

Apologies for not looking at this sooner. A few comments that I hope will clarify some items

There are two key challenges to face as we add Type 2 🥈 support

GenerationInputs::tries uses the (MPT/Type 1 🥇) format:
I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization

I'm not sure I understand how this can be considered out of scope, as the trace_decoder must pass "understandable" inputs to the proving backend (in evm_arithmetization) which in the case of the SMT/ Type 2 🥈 format would require a change in this GenerationInputs::tries type. Note that the feat/type2 branch currently supports SMT/ Type 2 🥈 specific format on the proving backend side. The goal of #20 is to remove the need of a distinct feat/type2 branch, and allow upper layers (trace_decoder / zero-bin) to rely on conditional feature flag to target Type 1 / Type 2 backend prover. This conditional feature flagging was discussed internally and considered best for future-proofing and development / maintenance of proving backend with the two distinct statements.

  1. The data structure used for the tree reshaping. Let's discuss that more below.

The key data structure is this:

struct PartialTrieState {
state: HashedPartialTrie,
storage: HashMap<H256, HashedPartialTrie>,
txn: HashedPartialTrie,
receipt: HashedPartialTrie,
}

All those HashedPartialTries (MPTs) are going to have to be replaced if we want this to work.

Minor clarification, but the last statement is not accurate. Transaction and Receipt tries do not change, and are still HashedPartialTrie. The state would be replaced by a state_smt field containing serialized SMT (i.e. Vec<U256>), see this code. When forming the final TrieInputs from this PartialTrieState, note that the storage field is removed.

That being said, I agree with you that "2. Find some semantic representation, flushing out a hash using the required format when needed." seems the way to go.

To clarify on your comment "it implies that plonky2 has deeper knowledge of the tries." on delete_node_and_report_remaining_key_if_branch_collapsed: we need the prover to have access to all accounts / storage slots it may try accessing. Some edge cases, like an SSTORE inducing an OOG error would yield unprovable txns if we were to parse "trivially" the clients' payloads, as these usually trim off these info (operation not executed fully -> no need for account data in witness => missing account from trie on the prover side). Same thing happens when deleting a branch child, which results in a collapse, which impacts native tracers, see #237 for more info.

txn is never written

Yes it is, just above the trie_state.receipt update, from the provided TxnMetaState in update_txn_and_receipt_tries().

@BGluth
Copy link
Contributor

BGluth commented Jul 29, 2024

Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc, and the waters will likely only get muddier with 1.

Yeah, I found having redundant type aliases for U256/H256 just to specify if a type is an unhashed/hashed version of something (eg. Address/HashedAddress) helped a good amount with this.

Find some semantic representation, flushing out a hash using the required format when needed.

Hey could you elaborate a bit more on what this might look like? Are you thinking about using a trait to abstract the common operations away (eg. insert, hash, etc.) between the two trie types? This is what my original attempt at adding smt support was attempting to do, as the delta application logic is nearly identical between mpt & smt tries (with the only difference being the trie operations changed and a few steps for setting up mpt tries became irrelevant).

@0xaatif
Copy link
Contributor Author

0xaatif commented Oct 22, 2024

MVP for this was shipped in #732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate: trace_decoder Anything related to the trace_decoder crate.
Projects
Status: Done
Development

No branches or pull requests

3 participants