Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

[MPT] Misc refactoring #972

Merged
merged 59 commits into from
Apr 5, 2023

Conversation

Brechtpd
Copy link
Collaborator

@Brechtpd Brechtpd commented Dec 7, 2022

Some refactoring which I believe decreases code duplication and increases code readability.

Some TODOs:

  • The code required to get the previous rlc/mult data or the inclusion in the parent check is quite complicated because of the different node types. Instead it may be better to use a fixed location to store this data in so a node can simply use this data directly instead of having to figure out on its own where to find the data. This is cleaner because this way each node can decide on its own how these should be handled.
  • Currently RLP decoding is done using a couple of selectors that are inputs from the prover. Then there are some checks if these are correct, though these are not complete. I think it's easier to think about this if we would just use a lookup to directly verify if these selectors are set correctly so we don't have to worry about edge cases are cases that are hard to constrain using custom gates.
  • There are currently many cases in the main state maching because each row type is it's own state. However there is not really any reuse between custom gates between these rows, except for branches. It'll likely be quite a bit simple to just have a single state for account, storage and extension and just use multiple rows in those states
  • May be a good idea to split up branches and extension nodes. (semi done)
  • There's a couple of circuit tools that were added in this PR to make writing the MPT circuit more manageable. A lot of these tools can still be greatly improved.
  • Number of lookups has been reduced a lot, but there are still many optimization possibilities (have not looked into reducing the expression degree for example).
  • The circuit uses a fixed layout which uses around 100 columns (of which a lot need to be byte constrained so needs a lot of lookups as well). This makes the circuit quite a bit more dense than probably required, a more flexible way to manager the required data so the width/height can be choses would be very useful I think.

@github-actions github-actions bot added the crate-zkevm-circuits Issues related to the zkevm-circuits workspace member label Dec 7, 2022
Copy link
Collaborator

@miha-stopar miha-stopar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! Thanks Brecht!

zkevm-circuits/src/mpt_circuit/selectors.rs Outdated Show resolved Hide resolved
zkevm-circuits/src/mpt_circuit/selectors.rs Outdated Show resolved Hide resolved
@Brechtpd Brechtpd changed the title [MPT] Selectors refactor [MPT] Misc refactoring Dec 13, 2022
@miha-stopar miha-stopar self-requested a review December 14, 2022 16:05
@miha-stopar
Copy link
Collaborator

Wonderful! Thanks Brecht!

Didn't see at first the PR is still a draft - I removed the approval for now, will wait till it's finished. Nonetheless, I checked the first two commits and it really looks great!

// Currently, the extension node S and extension node C both have the same key RLC -
// however, sometimes extension node can be replaced by a shorter extension node
// (in terms of nibbles), this is still to be implemented.
// TODO: extension nodes of different nibbles length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs related to the extension nodes of different lengths can be removed - some additional rows were needed to cover these cases and the constraints are implemented in these rows, no changes of the constraints in the extension node rows will be needed (the additional rows implementation is in #914).

zkevm-circuits/src/mpt_circuit/branch/branch_key.rs Outdated Show resolved Hide resolved
// - hashed branch has RLP_HASH_VALUE at c_rlp2 and hash in c_advices,
// - non-hashed branch has 0 at c_rlp2 and all the bytes in c_advices
// TODO(Brecht): why different layout for hashed values? If for hash detection
// just do == 32?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it was easier to check whether the branch is hashed or not (just using c_rlp2 * c160_inv). But I can change the witness generator quickly if you think it would be better to have the same layout for both cases (but more complex expression).

zkevm-circuits/src/mpt_circuit/branch/extension_node.rs Outdated Show resolved Hide resolved
zkevm-circuits/src/mpt_circuit/branch/extension_node.rs Outdated Show resolved Hide resolved
@leolara leolara removed their request for review February 14, 2023 15:17
) -> Self {
// TODO(Brecht): strangely inconsistent between storage/account (see the need of
// for_placeholder_s). Something more similar to how the drifted key
// works (s and c cases separately makes more sense to me).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be the same for storage and account (both either using S or C). It works as it is, because in the non-existing proof both (S and C) proofs are the same. But we should decide on which one to use and then be consistent.

)
};
// TODO(Brecht): somehow the start index doesn't dependend on the list
// is_short/is_long?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should mention this more in the code - is_long is not fully implemented - it's very unlikely that the number of extension node nibbles would be big enough to make the extension node longer than 55 bytes (having 32 bytes for a branch hash, that would mean around 40 nibbles (20 bytes)), so I decided not to continue with is_long implementation, but didn't want to delete the parts that were already implemented - maybe it would be better to do so?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible we should implement it! It'd be great if there's a test case for this so it can be tested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. It should be easier now with all the tooling you provided :). I will prepare a test and then try to prepare a PR with long implementation.

}
(ext_node_rlc.expr(), config.ext_rlp_key.num_bytes(), config.ext_is_not_hashed.expr())
};
// TODO(Brecht): why not if it's a placeholder?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placeholder branch just fills the rows (mostly, except for the drifted_node RLC) to preserve the parallel layout, so the hash that is needed to be checked in the proof with the placeholder branch is the hash of a leaf to be in the branch above the placeholder branch.

@leolara
Copy link
Contributor

leolara commented Mar 1, 2023

I would like to propose this way of querying the columns, I think it is more readable

Brechtpd#8

root_prev: values[3],
root: values[4],
})
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I now understand the approach with store/load lookups. I very much like it! I would only perhaps change the names of the functions:

  • witness_store is intuitive - it stores the values in the array for later use
  • witness_load seems a bit less intuitive to me - what about witness_assign?
  • store adds an entry to the lookup table, so perhaps add_to_lookup_table?
  • load adds a lookup to be executed, so perhaps add_lookup?

But I trust your choices, you can leave it as it is if you disagree with the proposed names.

I have one question also - you added key to the table to enable adding lookups with offset different from 0, right? As it is now (offset always 0), the table could be without key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

halo2 uses assign verb, so I would keep using that one, if it is about assigning a witness.

Copy link
Collaborator Author

@Brechtpd Brechtpd Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it store/load is because I wanted to make it clear what it does, instead of how it does it (which is more of an implementation detail). Because if you use it you don't really care how it does, just that you have something that behaves like some kind of memory. Ideally this can be used by people even if they don't even know (and don't really have to care) how it works. And if only the latest data needs to be loaded it's actually possible to implement the same thing without lookups so easily switched out with another implementation with the same behavior then.

I have one question also - you added key to the table to enable adding lookups with offset different from 0, right? As it is now (offset always 0), the table could be without key?

The key is still important for the current lookup based implementation even when always loading the last data to make sure the latest data is actually loaded. For example:

row instruction key memory_value
0 store(a) 0
1 1 a
2 load(key.cur(), a) 1
3 1
4 store(b) 1
5 load(key.cur(), b) 2 b

Without the key, on row 5 for example, you could do load(a), and because a is in the lookup table it would be a valid lookup, but it shouldn't be because the latest stored value wouldn't be loaded (which is not what you'd expect from the behavior of something similar to memory).

halo2 uses assign verb, so I would keep using that one, if it is about assigning a witness.

Normally very much agree, but this one is a bit non-standard because on the witness generation it implements the same kind of mechanism. So if you want to load data you actually just call load and it will fill in the correct data automatically while for store you have to specify the values that need to be stored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Thanks!

@miha-stopar miha-stopar marked this pull request as ready for review April 5, 2023 11:20
@miha-stopar miha-stopar requested a review from a team as a code owner April 5, 2023 11:20
@miha-stopar
Copy link
Collaborator

Will merge it now - I haven't go through every detail of the refactoring yet, but it's probably better to merge it now and enable sync with master. I will finish the review in parallel with rewriting the specs.

@miha-stopar miha-stopar merged commit 3ac930f into privacy-scaling-explorations:mpt2 Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
crate-gadgets Issues related to the gadgets workspace member crate-zkevm-circuits Issues related to the zkevm-circuits workspace member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants