Skip to content

Implementation of polynomial KZG proofs and 257-ary verkle trie

License

Notifications You must be signed in to change notification settings

nagatoism/verkle

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

257-ary verkle trie

Disclaimer: the code in this package is experimental. It can only be used in research and is not suitable for use in production. The trusted setup must be created in a secure environment. This is a responsibility of the user.

General

The repository contains an implementation of the so-called verkle tree as a 257-ary trie, a prefix tree. Here's an article explaining what are verkle trees. In short, these are significantly more efficient replacement for Merkle trees. Verkle trees uses polynomial KZG (aka Kate) commitments for vector commitments instead of usual hash function. The approach offers significant advantages.

The implementation uses trusted setup completely in Lagrange basis. Please find here all math and formulas as well as references to the articles it is based upon.

The implementation mostly follows the structure of the Merkle Patricia Tree. Instead of being hexary, it commits to 257-long vector in each node: 256 possible values of the byte plus one for an optional commitment to the terminal value in each node. So constant d = 257 in the trusted setup, hence the 257-ary trie.

Keys in the trie are of arbitrary length. They are prefixes of the keys from the state's key/values storage. So, the structure of the trie follows the hierarchical structure of the state. This allows commit to partitions of the state and results in shorter keys, more predictable and slow-changing structure of the trie. Any key in the trie can point to a terminal value and same time it can be a prefix in other keys. As it is seen from the implementation, the special 257th "character" does not introduce any significant overhead.

Repository and dependencies

The repository contains:

  • kzg package with the implementation of the KZG commitments and the trusted setup.
  • kzg_setup, the CLI program to create a trusted setup from a secret and store it into the file.
  • trie package contains implementation of the trie as well as corresponding tests and benchmarks.

The implementation of KZG commitments uses DEDIS Advanced Crypto Library for Go Kyber v3 and its bn256 bilinear pairing suite as cryptographic primitives. The implementation follows formulas presented in this article.

Implementation

The state

The state is assumed to be an arbitrary collection of the key/value pairs. Empty key (nil or "") in the implementation is a valid key. The state assumes the empty key always contains serialized binary value of the trusted setup upon which the commitments are calculated. So, you can always check if the root commitment contains the commitment to the trusted setup itself.

Determinism of the state: the state is a set of key/value pairs, i.e. no matter the order of how those key/value pairs were added to the storage and trie, the state (and the commitment to it) is the same.

The key/value store is and implementation of trie.KVStore interface.

The state is implemented as trie.State. It contains partitions for key/values pairs of the state and for the trie itself. It also contains the cache for keeping nodes being updated during bulky state update operations to make them atomic.

The trie

The trie is represented as a collection of key/value pairs in the trie partition of the state. Each key/value pair in the trie represents a node of the trie in serialized form.

type Node struct {
	pathFragment  []byte
	children      [256]kyber.Point
	terminalValue kyber.Scalar
}

Each node can keep commitments to its up to 256 children and to the terminal value as one vector.

The ith child has a commitment to it in children[i] or nil if there's not commitment to it. Commitment is represented by kyber.Point interface which here is a point on the curve G1.

The commitment to the terminal value, if exists, is not nil and is equal to the blake2b hash of the data itself, adjusted to the underlying field of the curves. It represented by kyber.Scalar interface.

Each node represents a vector V = (v0, v1, ...., v256) of length 257. Value of vi is 0 if value of the underlying commitment is nil. Otherwise, v256 corresponds to the terminal value and other vi are blake2b hashes of commitments adjusted to the field.

Commitment to the node is the commitment to the vector V.

The pathFragment is a slice (can be empty) of bytes taken from the key of the key/value pair in the state.

Let's say the node N is stored in the trie under the key K. Concatenation P = K || N.pathFragment means the following:

  • if N contains commitment to the terminal value V, the P is the key of that value in the state: P: V.
  • for any not nil child with index 0 <= i < 256, the Pi = P || {i} = K || N.pathFragment || {i} is the key of the node with the vector of commitments of the child. Here {i} is a slice of one byte with value i.

So, whenever we need a proof for the key/value pair K: V in the state, we start from the empty key which corresponds to the root node and then recursively follow the path by concatenating corresponding pathFragment values and picking corresponding byte of the next child in each node. The process is finished when we reach our key and the corresponding node which contains commitment of the terminal value V.

Example

Let's say we have the following key/value pairs in the state:

   "": <trusted setup>
   "abra": "something"
   "abrakadabra": "anything"
   "abra+": "314"
   "abra@": "217"
   "abra-+" "42"

The resulting 257-ary verkle trie will look like this:

Benchmarks

On Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz laptop.

  • building a trie: 0.67 ms per 1 added key/value pair
  • generating proof from the state in memory with 100000 keys: 168 ms per proof of 1 key/value. It is an expensive operation because always requires K x 257 operations on the curve, where K is number of nodes in the proof.
  • verifying proof (state with 100000 keys): 12.6 ms per verification

Trie size estimates:

With 100000 key/value pairs in the state generated uniformly randomly with max key size 70:

  • keys/nodes in the trie: 128648 (excess factor 28%)
  • average length of the key in the trie: 2.6 bytes
  • average number of children in the node: 1.78
  • number of nodes with only terminal value (no children): 98685 (76.7%)
  • 96% of nodes has 3 or fewer children
  • distribution of length of proofs: 22% have length 3, 78% have length 4

With 100000 key/value pairs in the state generated with max key size 60 assuming realistic patterns of the state of the IOTA Smart Contract chain: first 4-6 bytes identified partition of the smart contract.

  • keys/nodes in the trie: 107940 (excess factor 7%)
  • average length of the key in the trie: 6.07 bytes
  • average number of children in the node: 1.93
  • number of nodes with only terminal value (no children): 98497 (91.2%)
  • 96% of nodes has 1 or 2 children
  • distribution of length of the proof: 38% have length 4, 49% have length 5, 13% have length 6

The following chart shows size of keys in the trie partition in bytes:

The keys are very short due to the big width of the tree.

TODO

  • The function to remove a key/value pair from the state is not implemented yet.
  • Optimize on the pattern when most (90%+) nodes are nodes which commits to terminals

Links

Acknowledgements

Special thanks to Dankrad Feist for pointing out my crypto-mathematical mistakes

About

Implementation of polynomial KZG proofs and 257-ary verkle trie

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%