Disclaimer: the code in this package is experimental. It can only be used in research and is not suitable for use in production. The trusted setup must be created in a secure environment. This is a responsibility of the user.
The repository contains an implementation of the so-called verkle tree as a 257-ary trie, a prefix tree. Here's an article explaining what are verkle trees. In short, these are significantly more efficient replacement for Merkle trees. Verkle trees uses polynomial KZG (aka Kate) commitments for vector commitments instead of usual hash function. The approach offers significant advantages.
The implementation uses trusted setup completely in Lagrange basis. Please find here all math and formulas as well as references to the articles it is based upon.
The implementation mostly follows the structure of the Merkle Patricia Tree.
Instead of being hexary, it commits to 257-long vector in each node:
256 possible values of the byte plus one for an optional commitment to the terminal value in each node.
So constant d = 257
in the trusted setup, hence the 257-ary trie.
Keys in the trie are of arbitrary length. They are prefixes of the keys from the state's key/values storage. So, the structure of the trie follows the hierarchical structure of the state. This allows commit to partitions of the state and results in shorter keys, more predictable and slow-changing structure of the trie. Any key in the trie can point to a terminal value and same time it can be a prefix in other keys. As it is seen from the implementation, the special 257th "character" does not introduce any significant overhead.
The repository contains:
kzg
package with the implementation of the KZG commitments and the trusted setup.kzg_setup
, the CLI program to create a trusted setup from a secret and store it into the file.trie
package contains implementation of the trie as well as corresponding tests and benchmarks.
The implementation of KZG commitments uses DEDIS Advanced Crypto Library for Go Kyber v3
and its bn256
bilinear pairing suite as cryptographic primitives.
The implementation follows formulas presented in this article.
The state is assumed to be an arbitrary collection of the key/value pairs.
Empty key (nil
or ""
) in the implementation is a valid key. The state assumes the empty key always contains
serialized binary value of the trusted setup upon which the commitments are calculated.
So, you can always check if the root commitment contains the commitment to the trusted setup itself.
Determinism of the state: the state is a set of key/value pairs, i.e. no matter the order of how those key/value pairs were added to the storage and trie, the state (and the commitment to it) is the same.
The key/value store is and implementation of trie.KVStore
interface.
The state is implemented as trie.State
. It contains partitions for key/values pairs of the state and for the trie itself.
It also contains the cache for keeping nodes being updated during bulky state update operations to make them atomic.
The trie is represented as a collection of key/value pairs in the trie
partition of the state.
Each key/value pair in the trie represents a node of the trie in serialized form.
type Node struct {
pathFragment []byte
children [256]kyber.Point
terminalValue kyber.Scalar
}
Each node can keep commitments to its up to 256 children and to the terminal value as one vector.
The ith child has a commitment to it in children[i]
or nil
if there's not commitment to it.
Commitment is represented by kyber.Point
interface which here is a point on the curve G1
.
The commitment to the terminal value, if exists, is not nil
and is equal to the blake2b
hash of the data itself,
adjusted to the underlying field of the curves. It represented by kyber.Scalar
interface.
Each node represents a vector V = (v0, v1, ...., v256)
of length 257. Value of vi
is 0 if value of the underlying
commitment is nil
. Otherwise, v256
corresponds to the terminal value and other vi
are blake2b
hashes of
commitments adjusted to the field.
Commitment to the node is the commitment to the vector V
.
The pathFragment
is a slice (can be empty) of bytes taken from the key of the key/value pair in the state.
Let's say the node N
is stored in the trie under the key K
. Concatenation P = K || N.pathFragment
means the following:
- if
N
contains commitment to the terminal valueV
, theP
is the key of that value in the state:P: V
. - for any not
nil
child with index0 <= i < 256
, thePi = P || {i} = K || N.pathFragment || {i}
is the key of the node with the vector of commitments of the child. Here{i}
is a slice of one byte with valuei
.
So, whenever we need a proof for the key/value pair K: V
in the state, we start from the empty key which corresponds to the
root node and then recursively follow the path by concatenating corresponding pathFragment
values
and picking corresponding byte of the next child in each node. The process is finished when we
reach our key and the corresponding node which contains commitment of the terminal value V
.
Let's say we have the following key/value pairs in the state:
"": <trusted setup>
"abra": "something"
"abrakadabra": "anything"
"abra+": "314"
"abra@": "217"
"abra-+" "42"
The resulting 257-ary verkle trie will look like this:
On Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz laptop.
- building a trie: 0.67 ms per 1 added key/value pair
- generating proof from the state in memory with 100000 keys: 168 ms per proof of 1 key/value. It is an expensive
operation because always requires
K x 257
operations on the curve, whereK
is number of nodes in the proof. - verifying proof (state with 100000 keys): 12.6 ms per verification
Trie size estimates:
With 100000 key/value pairs in the state generated uniformly randomly with max key size 70
:
- keys/nodes in the trie: 128648 (excess factor 28%)
- average length of the key in the trie: 2.6 bytes
- average number of children in the node: 1.78
- number of nodes with only terminal value (no children): 98685 (76.7%)
- 96% of nodes has 3 or fewer children
- distribution of length of proofs: 22% have length 3, 78% have length 4
With 100000 key/value pairs in the state generated with max key size 60 assuming realistic patterns of the state of the IOTA Smart Contract chain: first 4-6 bytes identified partition of the smart contract.
- keys/nodes in the trie: 107940 (excess factor 7%)
- average length of the key in the trie: 6.07 bytes
- average number of children in the node: 1.93
- number of nodes with only terminal value (no children): 98497 (91.2%)
- 96% of nodes has 1 or 2 children
- distribution of length of the proof: 38% have length 4, 49% have length 5, 13% have length 6
The following chart shows size of keys in the trie partition in bytes:
The keys are very short due to the big width of the tree.
- The function to remove a key/value pair from the state is not implemented yet.
- Optimize on the pattern when most (90%+) nodes are nodes which commits to terminals
- Constant-Size Commitments to Polynomials and Their Applications, the original paper
- KZG polynomial commitments by Dankrad Feist
- PCS multiproofs using random evaluation by Dankrad Feist
- Verkle trees by Vitalik Buterin
- Kate Commitments: A Primer
- DEDIS Advanced Crypto Library for Go Kyber v3
- Modified Merkle Patricia Trie Specification
- Experimental KZG implementation. Math of this implementation explained in detail
- EIP Ethereum Improvement Proposal on verkle trees
Special thanks to Dankrad Feist for pointing out my crypto-mathematical mistakes