core, trie: rework trie database #26813

rjl493456442 · 2023-03-06T03:37:34Z

This PR reworks the trie database a bit to simplify things and make it be usable by verkle tree.

Trie database uses reference counter mechanism to prune stale nodes in memory. It relies on the references
between parent node and child nodes to determine if the node is already outdated.

However, the trie database is very MPT specific. For example, it maintains the MPT nodes directly and iterate
children of nodes with MPT rules. It will bring some performance speed up by avoiding node encoding/decoding.
But it's not abstract enough, and very unfriendly for new trie types, e.g. verkle tree.

In this PR, trie database will only maintain encoded node blobs with additional metadata(e.g the children hashes
node). In this way, trie database could be abstract enough to be used by different trie implementations. And also
the huge benefit is it unties the trie and trie database, we can move the trie database to a separate package
in the future.

The downsize is trie database needs to maintain the children hashes of nodes additionally, it will requires more
memory space. We need to twist a reasonable memory allowance for dirty node set.

fjl · 2023-03-07T14:24:46Z

trie/nodeset.go

-	node node        // Cached collapsed trie node, or raw rlp data, nil for deleted nodes
+	hash     common.Hash   // Node hash, computed by hashing rlp value, empty for deleted nodes
+	node     []byte        // Cached RLP-encoded trie node, nil for deleted nodes
+	children []common.Hash // The list of children hashes


I think we could optimize the size of this struct by storing offsets into node instead. For example, we could have a field definition like:

children *[16]uint16

The values stored in the array would be the byte offsets where a child hash is located in the encoded node. This definition would have memory overhead of 8 bytes (nil pointer) for nodes without any children, and 8 bytes (pointer) + 32 bytes (array) for nodes with children. Smaller than the current slice-based children list even when only a single child is present!

Compare this to the overhead of the current definition

children []common.Hash

which is 16 bytes (slice) for nodes with no children, and 16 + n*32 bytes for nodes with n children.

Of course, the [16]uint16 is specific to MPT with its 16 children and small maximum node size. We can make it work for the verkle tree using generics, i.e.

type childrenArray interface{ getChild(i int, node []byte) []byte } type mptChildrenArray [16]uint16 func (c *mptChildrenArray) getChild(i int, node []byte) []byte { offset := (*c)[i] return node[offset:offset+32] } type memoryNode[CA childrenArray] struct{ hash common.Hash // Node hash, computed by hashing rlp value, empty for deleted nodes node []byte // Cached RLP-encoded trie node, nil for deleted nodes children *CA // The list of children hashes } func (n *memoryNode[CA]) child(i int) []byte { return n.children.getChild(i, n.node) } type mptMemoryNode = memoryNode[mptChildrenArray]

Sounds cool!

Indeed as you said children list will be very large and would be way more better for taking byte offset approach. I will have a try.

One problem will be getting the actual offsets. I think this will require a rewrite/duplication of the node decoder functions.

Also, I do not advise implementing the generics-based thing right away, since that will further increase code complexity.

Yeah, it's non trivial to get these offsets, or to load the child correctly with the given index.

Now I pick up the interface approach, which has forEach function defined to iterate the children. It feels like a bit simpler. Do you think this approach makes sense to you?

I also have to think about it. Personally I think maintain children []common.Hash is the cleanest approach and general enough, but the metadata is too large. For this interface approach, it feels like still clean, but I am not sure if it's suitable.

rjl493456442 · 2023-03-13T08:17:25Z

After running 7 days, this PR is slightly slower than master branch, which is approximately 1.7% slower
because of additional RLP encoding/decoding overhead.

karalabe · 2023-03-13T08:29:37Z

trie/database.go

-	flushNext common.Hash // Next node in the flush-list
+	node      []byte                   // Encoded node blob
+	parents   uint32                   // Number of live nodes referencing this one
+	external  map[common.Hash]struct{} // The set of external children


Why the change from int to marker? Won't this lose a "count" of the reference?

I guess one example could be the meta-parent (common.Hash) referencing the same root hash multiple times (Clique networks, post-merge empty-block state). In that case, we'll not add the second reference this way and the dereference of the first might actually delete data still referenced later?

I don't maintain the common.Hash -> root references anymore, seems I found it's unnecessary.
The parents counter of root node is already enough.

In the case you describe, the parent count of root node will be bumped when this state is requested to be referenced explicitly. All nodes referenced by this root node will be retained until the parent count is decreased to 0.

We can even replace it with a common.Hash pointer, because in our case, a node can have at most 1 external child(storage root).

trie/database.go

karalabe · 2023-03-14T09:05:22Z

Pls rebase

trie/committer.go

gballet · 2023-03-28T12:58:28Z

trie/committer.go

 		nhash = common.BytesToHash(hash)
 		mnode = &memoryNode{
 			hash: nhash,
-			node: simplifyNode(n),
-			size: uint16(size),
+			node: nodeToBytes(n),


I've got an issue with this, because in verkle trees, serializing a node is very expensive and can (and should) be done in parallel in order to factor some computation. I would very much prefer to keep a pointer to a non-serialized node.

Come to think of it, I could just not do a node serialization but simply a copy of the commitment bytes. It would be faster, but a node's size would be around 16K.

I think about it. After the change in this PR, there are some extra costs for verkle, mostly the node serialization and deserialization:

Scenario description

There are a few accounts and a few storage slots touched in the mainnet block. Let's use 500 accounts and 3.5K slots here for calculation.

In verkle, the node width is 256, the average tree depth is 4. If we need to touch 4K leaves in total, it means 4K leaf nodes and 585(1+8+8^2+8^3) internal nodes will be accessed.

Deserialization

It happens when the node is accessed. The major cost is the conversion from node hash to node commitment(find the elliptic curve point by its X-coordinate). I wrote a benchmark and it turns out it's indeed extremely expensive.

// BenchmarkHashToCommitment // BenchmarkHashToCommitment-8 45598 25769 ns/op 32 B/op 1 allocs/o func BenchmarkHashToCommitment(b *testing.B) { p := &Generator p.Add(p, p) hash := p.Bytes() b.ResetTimer() b.ReportAllocs() for i := 0; i < b.N; i++ { var elem Element elem.SetBytesTrusted(hash[:]) } }

If we need to do the conversion for 4.58K nodes, the overall cost will be 112ms. Fortunately we have trie prefetcher which will try to load and decode nodes during evm execution. Not sure how much can be covered by it though.

Serialization

The same, the major cost of serialization comes from the conversion of node commitment to node hash. And it's possible to speed it up by running the conversion in a batch.

So the total cost of this conversion is 4.5ms and might be possible to reduce it a lot.

// BenchmarkCommitmentToHash // BenchmarkCommitmentToHash-8 740104 1625 ns/op 0 B/op 0 allocs/op func BenchmarkCommitmentToHash(b *testing.B) { p := &Generator p.Add(p, p) b.ResetTimer() b.ReportAllocs() for i := 0; i < b.N; i++ { p.Bytes() } } // nodes = 16, 384.625 ns in average // BenchmarkElementsToBytes // BenchmarkElementsToBytes-8 169038 6154 ns/op 1552 B/op 4 allocs/op func BenchmarkElementsToBytes16(b *testing.B) { benchmarkElementsToBytes(b, 16) } // nodes = 256, 250.41015625 ns in average // BenchmarkElementsToBytes256 // BenchmarkElementsToBytes256-8 17974 64105 ns/op 24832 B/op 4 allocs/op func BenchmarkElementsToBytes256(b *testing.B) { benchmarkElementsToBytes(b, 256) } func benchmarkElementsToBytes(b *testing.B, children int) { var ( point = &Generator elements []*Element ) for i := 0; i < children; i++ { point = point.Add(point, point) elements = append(elements, point) } b.ResetTimer() b.ReportAllocs() for i := 0; i < b.N; i++ { ElementsToBytes(elements) } }

Summary

The cost of node serialization and deserialization is not trivial. But if we keep the node or a pointer of node in memory, the trie database will be MPT or Verkle specific. I hope we can make it general enough for both trees.

But as you said, if we change the serialization rules a bit, we might be able to get rid of the conversion cost. Currently, we use the X-coordinate as the node identifier. This identifier is expected to be unique. The benefits of using X-coordinate as the identifier is:

We can uniquely identify a elliptic curve point which is the commiement of the node

The identifier is also unique

The size of X-coordinate is smaller(32 bytes)

But what if we use a different method to calculate the node identifier:
e.g: we can simply compute the sha256 hash of the node coordinates (identifier = sha256(x,y,z)) or the hash of serialized node (identifier = sha256(serialized_node)). which can also uniquely represent a node.

In serialization, we can serialize the x,y,z coordinates as well. Each of them are 32 bytes, result in extra 96 bytes. But compare with the 256 children, it's still trivial.

In this scheme, we can get rid of the conversion, the node serialization and deserialization should be away more faster.

karalabe · 2023-04-24T07:30:51Z

trie/database.go

 }

 // insert inserts a simplified trie node into the memory database.
 // All nodes inserted by this function will be reference tracked
 // and in theory should only used for **trie nodes** insertion.
-func (db *Database) insert(hash common.Hash, size int, node node) {
+func (db *Database) insert(hash common.Hash, node []byte) {


Any reason for using []byte and not rawNode?

karalabe

SGTM, but really only time running will tell if everything's truly correct 🙃

* core, trie: rework trie database * trie: fix comment

This reverts commit 549cd67.

rjl493456442 force-pushed the refactor-tdb branch from e10b46b to 4e3a431 Compare March 6, 2023 03:54

fjl reviewed Mar 7, 2023

View reviewed changes

rjl493456442 force-pushed the refactor-tdb branch 3 times, most recently from 6df66c6 to 09b0c89 Compare March 13, 2023 07:47

karalabe reviewed Mar 13, 2023

View reviewed changes

trie/database.go Outdated Show resolved Hide resolved

rjl493456442 marked this pull request as ready for review March 14, 2023 06:59

rjl493456442 force-pushed the refactor-tdb branch 2 times, most recently from 4247d14 to 6b929a7 Compare March 15, 2023 04:08

rjl493456442 mentioned this pull request Mar 15, 2023

all: implement path-based state scheme #25963

Merged

MariusVanDerWijden reviewed Mar 17, 2023

View reviewed changes

trie/committer.go Show resolved Hide resolved

rjl493456442 force-pushed the refactor-tdb branch 3 times, most recently from d7f4c50 to 9156860 Compare March 22, 2023 02:21

holiman added status:triage and removed status:triage labels Mar 28, 2023

gballet reviewed Mar 28, 2023

View reviewed changes

core, trie: rework trie database

c0f698e

rjl493456442 force-pushed the refactor-tdb branch from 9156860 to c0f698e Compare April 21, 2023 02:20

trie: fix comment

ca5fb4a

karalabe reviewed Apr 24, 2023

View reviewed changes

karalabe approved these changes Apr 24, 2023

View reviewed changes

karalabe added this to the 1.11.7 milestone Apr 24, 2023

karalabe merged commit bbcb5ea into ethereum:master Apr 24, 2023

shekhirin pushed a commit to shekhirin/go-ethereum that referenced this pull request Jun 6, 2023

core, trie: rework trie database (ethereum#26813)

b6805f7

* core, trie: rework trie database * trie: fix comment

This was referenced Jul 4, 2023

fix(deps): update all minor ethpandaops/ethereum-metrics-exporter#77

Open

fix(deps): update all minor ethpandaops/mempool-bridge#5

Open

This was referenced Jul 4, 2023

fix(deps): update all minor ethpandaops/ethcore#15

Open

fix(deps): update all minor ethpandaops/xatu#134

Open

devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023

core, trie: rework trie database (ethereum#26813)

549cd67

* core, trie: rework trie database * trie: fix comment

devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023

Revert "core, trie: rework trie database (ethereum#26813)"

44ff88c

This reverts commit 549cd67.

devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023

Revert "core, trie: rework trie database (ethereum#26813)"

6cf4e44

This reverts commit 549cd67.

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core, trie: rework trie database #26813

core, trie: rework trie database #26813

rjl493456442 commented Mar 6, 2023 •

edited

Loading

fjl Mar 7, 2023 •

edited

Loading

fjl Mar 7, 2023 •

edited

Loading

rjl493456442 Mar 8, 2023 •

edited

Loading

fjl Mar 8, 2023

fjl Mar 8, 2023

rjl493456442 Mar 8, 2023

rjl493456442 commented Mar 13, 2023

karalabe Mar 13, 2023

karalabe Mar 13, 2023

karalabe Mar 13, 2023

rjl493456442 Mar 13, 2023 •

edited

Loading

rjl493456442 Mar 13, 2023

karalabe commented Mar 14, 2023

gballet Mar 28, 2023

gballet Mar 28, 2023

rjl493456442 Mar 29, 2023 •

edited

Loading

rjl493456442 Mar 29, 2023 •

edited

Loading

karalabe Apr 24, 2023

karalabe left a comment

core, trie: rework trie database #26813

core, trie: rework trie database #26813

Conversation

rjl493456442 commented Mar 6, 2023 • edited Loading

fjl Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

fjl Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

rjl493456442 Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjl493456442 commented Mar 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjl493456442 Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe commented Mar 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjl493456442 Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Scenario description

Deserialization

Serialization

Summary

rjl493456442 Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe left a comment

Choose a reason for hiding this comment

rjl493456442 commented Mar 6, 2023 •

edited

Loading

fjl Mar 7, 2023 •

edited

Loading

fjl Mar 7, 2023 •

edited

Loading

rjl493456442 Mar 8, 2023 •

edited

Loading

rjl493456442 Mar 13, 2023 •

edited

Loading

rjl493456442 Mar 29, 2023 •

edited

Loading

rjl493456442 Mar 29, 2023 •

edited

Loading