-
Notifications
You must be signed in to change notification settings - Fork 20k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core, trie: rework trie database #26813
Conversation
e10b46b
to
4e3a431
Compare
trie/nodeset.go
Outdated
node node // Cached collapsed trie node, or raw rlp data, nil for deleted nodes | ||
hash common.Hash // Node hash, computed by hashing rlp value, empty for deleted nodes | ||
node []byte // Cached RLP-encoded trie node, nil for deleted nodes | ||
children []common.Hash // The list of children hashes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could optimize the size of this struct by storing offsets into node
instead. For example, we could have a field definition like:
children *[16]uint16
The values stored in the array would be the byte offsets where a child hash is located in the encoded node. This definition would have memory overhead of 8 bytes (nil pointer) for nodes without any children, and 8 bytes (pointer) + 32 bytes (array) for nodes with children. Smaller than the current slice-based children list even when only a single child is present!
Compare this to the overhead of the current definition
children []common.Hash
which is 16 bytes (slice) for nodes with no children, and 16 + n
*32 bytes for nodes with n
children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, the [16]uint16
is specific to MPT with its 16 children and small maximum node size. We can make it work for the verkle tree using generics, i.e.
type childrenArray interface{
getChild(i int, node []byte) []byte
}
type mptChildrenArray [16]uint16
func (c *mptChildrenArray) getChild(i int, node []byte) []byte {
offset := (*c)[i]
return node[offset:offset+32]
}
type memoryNode[CA childrenArray] struct{
hash common.Hash // Node hash, computed by hashing rlp value, empty for deleted nodes
node []byte // Cached RLP-encoded trie node, nil for deleted nodes
children *CA // The list of children hashes
}
func (n *memoryNode[CA]) child(i int) []byte {
return n.children.getChild(i, n.node)
}
type mptMemoryNode = memoryNode[mptChildrenArray]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds cool!
Indeed as you said children list will be very large and would be way more better for taking byte offset approach. I will have a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem will be getting the actual offsets. I think this will require a rewrite/duplication of the node decoder functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I do not advise implementing the generics-based thing right away, since that will further increase code complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's non trivial to get these offsets, or to load the child correctly with the given index.
Now I pick up the interface
approach, which has forEach
function defined to iterate the children. It feels like a bit simpler. Do you think this approach makes sense to you?
I also have to think about it. Personally I think maintain children []common.Hash
is the cleanest approach and general enough, but the metadata is too large. For this interface approach, it feels like still clean, but I am not sure if it's suitable.
6df66c6
to
09b0c89
Compare
flushNext common.Hash // Next node in the flush-list | ||
node []byte // Encoded node blob | ||
parents uint32 // Number of live nodes referencing this one | ||
external map[common.Hash]struct{} // The set of external children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from int to marker? Won't this lose a "count" of the reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one example could be the meta-parent (common.Hash
) referencing the same root hash multiple times (Clique networks, post-merge empty-block state). In that case, we'll not add the second reference this way and the dereference of the first might actually delete data still referenced later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one example could be the meta-parent (common.Hash
) referencing the same root hash multiple times (Clique networks, post-merge empty-block state). In that case, we'll not add the second reference this way and the dereference of the first might actually delete data still referenced later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't maintain the common.Hash -> root references anymore, seems I found it's unnecessary.
The parents counter of root node is already enough.
In the case you describe, the parent count of root node will be bumped when this state is requested to be referenced explicitly. All nodes referenced by this root node will be retained until the parent count is decreased to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can even replace it with a common.Hash pointer, because in our case, a node can have at most 1 external child(storage root).
Pls rebase |
4247d14
to
6b929a7
Compare
d7f4c50
to
9156860
Compare
nhash = common.BytesToHash(hash) | ||
mnode = &memoryNode{ | ||
hash: nhash, | ||
node: simplifyNode(n), | ||
size: uint16(size), | ||
node: nodeToBytes(n), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got an issue with this, because in verkle trees, serializing a node is very expensive and can (and should) be done in parallel in order to factor some computation. I would very much prefer to keep a pointer to a non-serialized node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, I could just not do a node serialization but simply a copy of the commitment bytes. It would be faster, but a node's size would be around 16K.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think about it. After the change in this PR, there are some extra costs for verkle, mostly the node serialization and deserialization:
Scenario description
There are a few accounts and a few storage slots touched in the mainnet block. Let's use 500 accounts and 3.5K slots here for calculation.
In verkle, the node width is 256, the average tree depth is 4. If we need to touch 4K leaves in total, it means 4K leaf nodes and 585(1+8+8^2+8^3) internal nodes will be accessed.
Deserialization
It happens when the node is accessed. The major cost is the conversion from node hash to node commitment(find the elliptic curve point by its X-coordinate). I wrote a benchmark and it turns out it's indeed extremely expensive.
// BenchmarkHashToCommitment
// BenchmarkHashToCommitment-8 45598 25769 ns/op 32 B/op 1 allocs/o
func BenchmarkHashToCommitment(b *testing.B) {
p := &Generator
p.Add(p, p)
hash := p.Bytes()
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
var elem Element
elem.SetBytesTrusted(hash[:])
}
}
If we need to do the conversion for 4.58K nodes, the overall cost will be 112ms. Fortunately we have trie prefetcher which will try to load and decode nodes during evm execution. Not sure how much can be covered by it though.
Serialization
The same, the major cost of serialization comes from the conversion of node commitment to node hash. And it's possible to speed it up by running the conversion in a batch.
So the total cost of this conversion is 4.5ms and might be possible to reduce it a lot.
// BenchmarkCommitmentToHash
// BenchmarkCommitmentToHash-8 740104 1625 ns/op 0 B/op 0 allocs/op
func BenchmarkCommitmentToHash(b *testing.B) {
p := &Generator
p.Add(p, p)
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
p.Bytes()
}
}
// nodes = 16, 384.625 ns in average
// BenchmarkElementsToBytes
// BenchmarkElementsToBytes-8 169038 6154 ns/op 1552 B/op 4 allocs/op
func BenchmarkElementsToBytes16(b *testing.B) {
benchmarkElementsToBytes(b, 16)
}
// nodes = 256, 250.41015625 ns in average
// BenchmarkElementsToBytes256
// BenchmarkElementsToBytes256-8 17974 64105 ns/op 24832 B/op 4 allocs/op
func BenchmarkElementsToBytes256(b *testing.B) {
benchmarkElementsToBytes(b, 256)
}
func benchmarkElementsToBytes(b *testing.B, children int) {
var (
point = &Generator
elements []*Element
)
for i := 0; i < children; i++ {
point = point.Add(point, point)
elements = append(elements, point)
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
ElementsToBytes(elements)
}
}
Summary
The cost of node serialization and deserialization is not trivial. But if we keep the node or a pointer of node in memory, the trie database will be MPT or Verkle specific. I hope we can make it general enough for both trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as you said, if we change the serialization rules a bit, we might be able to get rid of the conversion cost. Currently, we use the X-coordinate as the node identifier. This identifier is expected to be unique. The benefits of using X-coordinate as the identifier is:
- We can uniquely identify a elliptic curve point which is the commiement of the node
- The identifier is also unique
- The size of X-coordinate is smaller(32 bytes)
But what if we use a different method to calculate the node identifier:
e.g: we can simply compute the sha256 hash of the node coordinates (identifier = sha256(x,y,z))
or the hash of serialized node (identifier = sha256(serialized_node)
). which can also uniquely represent a node.
In serialization, we can serialize the x,y,z coordinates as well. Each of them are 32 bytes, result in extra 96 bytes. But compare with the 256 children, it's still trivial.
In this scheme, we can get rid of the conversion, the node serialization and deserialization should be away more faster.
9156860
to
c0f698e
Compare
} | ||
|
||
// insert inserts a simplified trie node into the memory database. | ||
// All nodes inserted by this function will be reference tracked | ||
// and in theory should only used for **trie nodes** insertion. | ||
func (db *Database) insert(hash common.Hash, size int, node node) { | ||
func (db *Database) insert(hash common.Hash, node []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for using []byte
and not rawNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, but really only time running will tell if everything's truly correct 🙃
* core, trie: rework trie database * trie: fix comment
* core, trie: rework trie database * trie: fix comment
This reverts commit 549cd67.
This reverts commit 549cd67.
This PR reworks the trie database a bit to simplify things and make it be usable by verkle tree.
Trie database uses reference counter mechanism to prune stale nodes in memory. It relies on the references
between parent node and child nodes to determine if the node is already outdated.
However, the trie database is very MPT specific. For example, it maintains the MPT nodes directly and iterate
children of nodes with MPT rules. It will bring some performance speed up by avoiding node encoding/decoding.
But it's not abstract enough, and very unfriendly for new trie types, e.g. verkle tree.
In this PR, trie database will only maintain encoded node blobs with additional metadata(e.g the children hashes
node). In this way, trie database could be abstract enough to be used by different trie implementations. And also
the huge benefit is it unties the trie and trie database, we can move the trie database to a separate package
in the future.
The downsize is trie database needs to maintain the children hashes of nodes additionally, it will requires more
memory space. We need to twist a reasonable memory allowance for dirty node set.