Overlay tree migration explorations #343

jsign · 2023-04-19T14:07:58Z

TL;DR: This PR contains a test/benchmark that simulates a touched tree with X key/values, and a later stage of migration Y key/values from a base tree. It includes some optimized methods for the migrated key/values.

The main intention of this PR is to show the exploration of:

Using the normal Insert(.., ..) API for the migrated key/values, which is somewhat naive compared to an optimized strategy, comparing speedups.
Already play with the optimized strategy and the "merging logic".
Simply write about findings/thoughts/next steps.

This PR isn't meant to be merged necessarily, so I'll keep it as a draft.
Not expecting a formal review, if you want to skim the code I'd recommend looking at:

tree_test.go to see TestBatchMigratedKeyValues test; you'll get 90% of the gist of the story there.
conversion.go see the new method InsertMigratedLeaves.

This code isn't final or similar, so expect to see rough edges. I added PR comments to guide a bit on the important parts.

Overall explanation and details.

I created a test/benchmark that compares some scenarios between using an “unbatched” insertion for the migrated key/vals (i.e: Insert(key, value) in a nutshell), and a “batched” version which uses some work I did for the full-tree migration version with some twists and extra stuff (nothing wild, just reasonable ideas).

Sharing the output of this new test/benchmark. ’ll add some notes below on how to read this:

$ go test . -run=TestBatchMigratedKeyValues -v
=== RUN   TestBatchMigratedKeyValues
Assuming 0 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 44ms, batched 24ms, 1.83x
        If 2000 extra key-values are migrated: unbatched 79ms, batched 41ms, 1.93x
        If 5000 extra key-values are migrated: unbatched 200ms, batched 93ms, 2.14x
        If 8000 extra key-values are migrated: unbatched 317ms, batched 147ms, 2.16x
Assuming 500 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 49ms, batched 30ms, 1.61x
        If 2000 extra key-values are migrated: unbatched 86ms, batched 47ms, 1.80x
        If 5000 extra key-values are migrated: unbatched 205ms, batched 100ms, 2.03x
        If 8000 extra key-values are migrated: unbatched 320ms, batched 153ms, 2.09x
Assuming 1000 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 54ms, batched 37ms, 1.46x
        If 2000 extra key-values are migrated: unbatched 93ms, batched 53ms, 1.73x
        If 5000 extra key-values are migrated: unbatched 212ms, batched 107ms, 1.98x
        If 8000 extra key-values are migrated: unbatched 330ms, batched 161ms, 2.04x
Assuming 2000 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 69ms, batched 50ms, 1.37x
        If 2000 extra key-values are migrated: unbatched 106ms, batched 67ms, 1.59x
        If 5000 extra key-values are migrated: unbatched 226ms, batched 122ms, 1.85x
        If 8000 extra key-values are migrated: unbatched 348ms, batched 174ms, 2.00x
Assuming 5000 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 106ms, batched 90ms, 1.17x
        If 2000 extra key-values are migrated: unbatched 146ms, batched 108ms, 1.36x
        If 5000 extra key-values are migrated: unbatched 265ms, batched 163ms, 1.63x
        If 8000 extra key-values are migrated: unbatched 407ms, batched 235ms, 1.73x

So, look at:

Assuming X key/values touched by block execution:
        If Y extra key-values are migrated: unbatched Ams, batched Bms, Cx

Then X, Y, A, B, and C mean:

X is the number of “active/touched” key/values in the tree in memory. X is simulating some touched key/values after a block execution.
Y is the number of key/values that we migrate from the MPT. They’re randomly generated.
A is the number of milliseconds it takes to insert the migrated key/values in the scenario for the “naive” strategy.
B is the number of milliseconds it takes to insert the migrated key/values in the scenario for the optimized strategy.
C is the speedup comparing the naive vs optimized case for the same scenario.

To understand what these “milliseconds it takes to insert the migrated/key values in this scenario”, the high-level run of the scenario is:

Generate a VKT with X random key/values
Start counting time.
Insert the Y key/values
Commit the tree.
Serialize the tree
Stop counting time.
[This is run both with naive (unbatched) API, and the optimized (batched) API; so we get A and B times, respectively]

The time between 2 and 6 is what I report as that time. It’s a bit “unfair” in that committing and serializing the tree also involves doing the work for the X key/values, which isn’t only work related to the Y key/values. But we can’t separate them since commit and serialize exploits batching in everything that exists (which is actually a good thing). So the actual speedup of the optimized implementation is probably better. In any case, it is good to have a number.

What we can conclude from this:

The optimized version makes inserting the migrated key/values cheaper in relative terms with “usual” insertions from a block execution since we can exploit some things to insert them more efficiently.
The “millisecond times” (B value), can give a rough sense of pure CPU work for inserting+committing+serializing which is most/all of the CPU work.
Reasonable confidence that the optimized version is correct since in all scenarios I’m comparing that the resulting root of both VKT with the unoptimized and optimized is the same. (That's why I mention this is a test/benchmark; doing both)

What we can’t conclude from this:

As mentioned, this is mostly measuring CPU work.
In a “full scenario”, inserting key/values of migrated MPT data will involve resolving some HashedNode in the path to insertion, which implies some extra disk lookups to load those nodes. Nothing weird, and quite normal, but have it in mind. Probably for the first 2-3 layers we don’t need new disk accesses, so it shouldn’t be that bad.
In a “full scenario” the base VKT tree would be probably a bit deeper; but that’s handwavy since it depends on how far into the full MPT migration you’re in, actually.

What we could do about the “can’t conclude” points is actually “embedding” this same scenario test in our replay benchmark with real data, which would be very interesting since:

The “touched” tree will be real, since it’s a real block execution. Or “as real” or “updated” as the data we use for importing the chain.
We can still generate random key/values to simulate MPT data; since that would be a worst-case anyway.
We can fully resolve HashedNodes for real, involving the mentioned disk lookups to load parts of the tree that aren’t in memory.
We can compare the “slowdown” or “extra work” for the replay benchmark with and without migration, tunning for different Y values (number of keys migrated per block).
This benchmark is quite pessimistic since in theory, we'll walk keys in order, so many key/values will be packed in a few leafs. That can mean that inserting migrated key values can have a reasonable speedup compared to this benchmark, for "batching" reasons.

jsign · 2023-04-19T15:18:55Z

conversion.go

+// BatchNewLeafNode creates a new leaf node from the given data. It optimizes LeafNode creation
+// by batching expensive cryptography operations. It returns the LeafNodes sorted by stem.
+func BatchNewLeafNode(nodesValues []BatchNewLeafNodeData) []LeafNode {


I'm using here something I did for the full tree conversion, but added parallelization at this layer. This wasn't needed in the tree conversion since, in that case we parallelized "at the client level" by working in subtrees.

But the idea is the same/similar.

jsign · 2023-04-19T15:20:06Z

conversion.go

+	panic("stems are equal")
+}
+
+func (n *InternalNode) InsertMigratedLeaves(leaves []LeafNode, resolver NodeResolverFn) error {


This is a sketch/first-version of a method that receives prepared LeafNode of base tree key/values, and attemps to "merge them" in a living VKT. Isn't final and can have rough edges.

jsign · 2023-04-19T15:21:56Z

conversion.go

+		case Empty:
+			parent.cowChild(ln.stem[parent.depth])
+			parent.children[ln.stem[parent.depth]] = &ln
+			ln.setDepth(parent.depth + 1)


Easy case, insert leaf and call it a day.

jsign · 2023-04-19T15:23:00Z

conversion.go

+			parent.cowChild(ln.stem[parent.depth])
+			parent.children[ln.stem[parent.depth]] = &ln
+			ln.setDepth(parent.depth + 1)
+		case *LeafNode:


OK, this case is more interesting. We have two subcases.

jsign · 2023-04-19T15:25:38Z

conversion.go

+			if bytes.Equal(node.stem, ln.stem) {
+				// In `ln` we have migrated key/values which should be copied to the leaf
+				// only if there isn't a value there. If there's a value, we skip it since
+				// our migrated value is stale.
+				nonPresentValues := make([][]byte, NodeWidth)
+				for i := range ln.values {
+					if node.values[i] == nil {
+						nonPresentValues[i] = ln.values[i]
+					}
+				}
+
+				node.updateMultipleLeaves(nonPresentValues)
+				continue
+			}


If we already have the leaf for the stem, we have to do a sort of "merging" but not blindly.

Only copy values if the current value is nil, which means our (migrated) value isn't stale.
So the idea in L121-L126 is to filter out values that are stale. Then we exploit the method updateMultipleLeaves to update the existing Leaf.

Note that the original LeafNode that we prepared with the migrated values wasted some effort in computing C1 and C2 which we aren't using here. That's fine... the probability of that work being wasted effort is the probability of having this case, which is very low probability. As in, the migrated value coincidentally matched a LeafNode that was touched in the block execution; so it's fine.

[Note for the future: we need some particular test to cover this case. Generating random key/values won't probably cover this. We'll have time for that. Early stages...]

[Note for the future: we need some particular test to cover this case. Generating random key/values won't probably cover this. We'll have time for that. Early stages...]

you can simply pick a few random keys from leaves and insert them in the tree with a different value, to simulate this case, no?

jsign · 2023-04-19T22:07:42Z

conversion.go

+		ln := leaves[i]
+		parent := n
+
+		// Look for the appropiate parent for the leaf node.


Wal the tree looking for the parent for the LeafNode to be inserted. While we walk, we have to resolve potential HashedNodes. Note that we carefully (L109) mark our walk as cow-ed since we know we'll insert the leaf "down the road".

jsign · 2023-04-19T22:08:05Z

conversion.go

+			nextParent, ok := parent.children[ln.stem[parent.depth]].(*InternalNode)
+			if !ok {
+				break
+			}


If we find a LeafNode or Empty, we're done; we have the parent.

jsign · 2023-04-19T22:08:22Z

conversion.go

+		default:
+			return fmt.Errorf("unexpected node type %T", node)


Just be sure nothing else can be found, which would be a bug.

jsign · 2023-04-19T22:08:55Z

conversion.go

+			// We do a sanity check to make sure that the fork point is not before the current depth.
+			if byte(idx) <= parent.depth {
+				return fmt.Errorf("unexpected fork point %d for nodes %x and %x", idx, node.stem, ln.stem)
+			}


Let's be paranoid and check if this invariant holds; if it doesn't hold, it's a bug.

jsign · 2023-04-19T22:10:14Z

conversion.go

+			// Create the missing internal nodes.
+			for i := parent.depth + 1; i <= byte(idx); i++ {
+				nextParent := newInternalNode(parent.depth + 1).(*InternalNode)
+				parent.cowChild(ln.stem[parent.depth])
+				parent.children[ln.stem[parent.depth]] = nextParent
+				parent = nextParent
+			}
+			// Add old and new leaf node to the latest created parent.
+			parent.cowChild(node.stem[parent.depth])
+			parent.children[node.stem[parent.depth]] = node
+			node.setDepth(parent.depth + 1)
+			parent.cowChild(ln.stem[parent.depth])
+			parent.children[ln.stem[parent.depth]] = &ln
+			ln.setDepth(parent.depth + 1)


Create the needed internal points depending on the "fork section" of the steam, and connect the final parent with the existing LeafNode and the "to be inserted" LeafNode.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

gballet · 2023-04-20T12:46:45Z

conversion.go

+			if bytes.Equal(node.stem, ln.stem) {
+				// In `ln` we have migrated key/values which should be copied to the leaf
+				// only if there isn't a value there. If there's a value, we skip it since
+				// our migrated value is stale.
+				nonPresentValues := make([][]byte, NodeWidth)
+				for i := range ln.values {
+					if node.values[i] == nil {
+						nonPresentValues[i] = ln.values[i]
+					}
+				}
+
+				node.updateMultipleLeaves(nonPresentValues)
+				continue
+			}


[Note for the future: we need some particular test to cover this case. Generating random key/values won't probably cover this. We'll have time for that. Early stages...]

you can simply pick a few random keys from leaves and insert them in the tree with a different value, to simulate this case, no?

gballet · 2023-04-20T16:16:31Z

conversion.go

+		parent := n
+
+		// Look for the appropiate parent for the leaf node.
+		for {


I'd prefer a recursive version of this in the end, but as long as it works fine there's no issue for now.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign · 2023-04-20T19:19:44Z

tree.go

-	if i >= NodeWidth-1 {
+	if i >= NodeWidth {


This was a bug.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

gballet

LGTM

jsign force-pushed the jsign/batchedinsertordered branch from 4ec1412 to 0190cce Compare April 19, 2023 15:01

jsign commented Apr 19, 2023

View reviewed changes

jsign force-pushed the jsign/batchedinsertordered branch from e92fc0c to eea2733 Compare April 19, 2023 15:22

jsign commented Apr 19, 2023

View reviewed changes

jsign referenced this pull request in jsign/go-ethereum Apr 20, 2023

add overlay migration simulation example

cc8db7e

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

gballet reviewed Apr 20, 2023

View reviewed changes

jsign added 12 commits April 20, 2023 13:55

add batch method for creating leaf nodes

b04a072

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

conversion: use constants

3124097

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

tree: add benchmark for batched insert

1410729

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

tree: improve migration leaves insertion test

1d3490c

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

tree: faster leave batching

38e946c

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

conversion: remove methods not strictly needed for overlay

f616b5e

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

tree: improve overall code

7bc923f

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

use less confusing names

b0ef2e0

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

remove method that is not used

95be586

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fix depth updating

13b6693

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

add many fixes and sanity checks

485ffb0

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

tree: use slices again for leaf nodes

6d2a6b2

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign force-pushed the jsign/batchedinsertordered branch 3 times, most recently from 02bf367 to 8102b90 Compare April 20, 2023 19:18

jsign commented Apr 20, 2023

View reviewed changes

tree.go

if i >= NodeWidth-1 {

if i >= NodeWidth {

Copy link

Collaborator Author

jsign Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug.

tree: lints

fe91282

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign force-pushed the jsign/batchedinsertordered branch from 8102b90 to fe91282 Compare April 20, 2023 19:21

jsign mentioned this pull request Apr 21, 2023

Include base -> overlay key-values migration logic gballet/go-ethereum#199

Merged

jsign marked this pull request as ready for review April 24, 2023 12:25

jsign requested a review from gballet April 24, 2023 12:30

gballet approved these changes Apr 24, 2023

View reviewed changes

gballet merged commit de802a6 into master Apr 24, 2023

jsign mentioned this pull request Apr 26, 2023

Batch NewLeaf node work in base API #349

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlay tree migration explorations #343

Overlay tree migration explorations #343

jsign commented Apr 19, 2023 •

edited

Loading

jsign Apr 19, 2023

jsign Apr 19, 2023 •

edited

Loading

jsign Apr 19, 2023

jsign Apr 19, 2023

jsign Apr 19, 2023 •

edited

Loading

gballet Apr 20, 2023

jsign Apr 20, 2023

jsign Apr 19, 2023

jsign Apr 19, 2023

jsign Apr 19, 2023

jsign Apr 19, 2023 •

edited

Loading

jsign Apr 19, 2023

gballet Apr 20, 2023

gballet Apr 20, 2023

jsign Apr 20, 2023

gballet left a comment

Overlay tree migration explorations #343

Overlay tree migration explorations #343

Conversation

jsign commented Apr 19, 2023 • edited Loading

Overall explanation and details.

Choose a reason for hiding this comment

jsign Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsign Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsign Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gballet left a comment

Choose a reason for hiding this comment

jsign commented Apr 19, 2023 •

edited

Loading

jsign Apr 19, 2023 •

edited

Loading

jsign Apr 19, 2023 •

edited

Loading

jsign Apr 19, 2023 •

edited

Loading