Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay tree migration explorations #343

Merged
merged 13 commits into from
Apr 24, 2023
Merged

Conversation

jsign
Copy link
Collaborator

@jsign jsign commented Apr 19, 2023

TL;DR: This PR contains a test/benchmark that simulates a touched tree with X key/values, and a later stage of migration Y key/values from a base tree. It includes some optimized methods for the migrated key/values.

The main intention of this PR is to show the exploration of:

  • Using the normal Insert(.., ..) API for the migrated key/values, which is somewhat naive compared to an optimized strategy, comparing speedups.
  • Already play with the optimized strategy and the "merging logic".
  • Simply write about findings/thoughts/next steps.

This PR isn't meant to be merged necessarily, so I'll keep it as a draft.
Not expecting a formal review, if you want to skim the code I'd recommend looking at:

  • tree_test.go to see TestBatchMigratedKeyValues test; you'll get 90% of the gist of the story there.
  • conversion.go see the new method InsertMigratedLeaves.

This code isn't final or similar, so expect to see rough edges. I added PR comments to guide a bit on the important parts.


Overall explanation and details.

I created a test/benchmark that compares some scenarios between using an “unbatched” insertion for the migrated key/vals (i.e: Insert(key, value) in a nutshell), and a “batched” version which uses some work I did for the full-tree migration version with some twists and extra stuff (nothing wild, just reasonable ideas).

Sharing the output of this new test/benchmark. ’ll add some notes below on how to read this:

$ go test . -run=TestBatchMigratedKeyValues -v
=== RUN   TestBatchMigratedKeyValues
Assuming 0 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 44ms, batched 24ms, 1.83x
        If 2000 extra key-values are migrated: unbatched 79ms, batched 41ms, 1.93x
        If 5000 extra key-values are migrated: unbatched 200ms, batched 93ms, 2.14x
        If 8000 extra key-values are migrated: unbatched 317ms, batched 147ms, 2.16x
Assuming 500 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 49ms, batched 30ms, 1.61x
        If 2000 extra key-values are migrated: unbatched 86ms, batched 47ms, 1.80x
        If 5000 extra key-values are migrated: unbatched 205ms, batched 100ms, 2.03x
        If 8000 extra key-values are migrated: unbatched 320ms, batched 153ms, 2.09x
Assuming 1000 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 54ms, batched 37ms, 1.46x
        If 2000 extra key-values are migrated: unbatched 93ms, batched 53ms, 1.73x
        If 5000 extra key-values are migrated: unbatched 212ms, batched 107ms, 1.98x
        If 8000 extra key-values are migrated: unbatched 330ms, batched 161ms, 2.04x
Assuming 2000 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 69ms, batched 50ms, 1.37x
        If 2000 extra key-values are migrated: unbatched 106ms, batched 67ms, 1.59x
        If 5000 extra key-values are migrated: unbatched 226ms, batched 122ms, 1.85x
        If 8000 extra key-values are migrated: unbatched 348ms, batched 174ms, 2.00x
Assuming 5000 key/values touched by block execution:
        If 1000 extra key-values are migrated: unbatched 106ms, batched 90ms, 1.17x
        If 2000 extra key-values are migrated: unbatched 146ms, batched 108ms, 1.36x
        If 5000 extra key-values are migrated: unbatched 265ms, batched 163ms, 1.63x
        If 8000 extra key-values are migrated: unbatched 407ms, batched 235ms, 1.73x

So, look at:

Assuming X key/values touched by block execution:
        If Y extra key-values are migrated: unbatched Ams, batched Bms, Cx

Then X, Y, A, B, and C mean:

  • X is the number of “active/touched” key/values in the tree in memory. X is simulating some touched key/values after a block execution.
  • Y is the number of key/values that we migrate from the MPT. They’re randomly generated.
  • A is the number of milliseconds it takes to insert the migrated key/values in the scenario for the “naive” strategy.
  • B is the number of milliseconds it takes to insert the migrated key/values in the scenario for the optimized strategy.
  • C is the speedup comparing the naive vs optimized case for the same scenario.

To understand what these “milliseconds it takes to insert the migrated/key values in this scenario”, the high-level run of the scenario is:

  1. Generate a VKT with X random key/values
  2. Start counting time.
  3. Insert the Y key/values
  4. Commit the tree.
  5. Serialize the tree
  6. Stop counting time.
    [This is run both with naive (unbatched) API, and the optimized (batched) API; so we get A and B times, respectively]

The time between 2 and 6 is what I report as that time. It’s a bit “unfair” in that committing and serializing the tree also involves doing the work for the X key/values, which isn’t only work related to the Y key/values. But we can’t separate them since commit and serialize exploits batching in everything that exists (which is actually a good thing). So the actual speedup of the optimized implementation is probably better. In any case, it is good to have a number.

What we can conclude from this:

  • The optimized version makes inserting the migrated key/values cheaper in relative terms with “usual” insertions from a block execution since we can exploit some things to insert them more efficiently.
  • The “millisecond times” (B value), can give a rough sense of pure CPU work for inserting+committing+serializing which is most/all of the CPU work.
  • Reasonable confidence that the optimized version is correct since in all scenarios I’m comparing that the resulting root of both VKT with the unoptimized and optimized is the same. (That's why I mention this is a test/benchmark; doing both)

What we can’t conclude from this:

  • As mentioned, this is mostly measuring CPU work.
  • In a “full scenario”, inserting key/values of migrated MPT data will involve resolving some HashedNode in the path to insertion, which implies some extra disk lookups to load those nodes. Nothing weird, and quite normal, but have it in mind. Probably for the first 2-3 layers we don’t need new disk accesses, so it shouldn’t be that bad.
  • In a “full scenario” the base VKT tree would be probably a bit deeper; but that’s handwavy since it depends on how far into the full MPT migration you’re in, actually.

What we could do about the “can’t conclude” points is actually “embedding” this same scenario test in our replay benchmark with real data, which would be very interesting since:

  • The “touched” tree will be real, since it’s a real block execution. Or “as real” or “updated” as the data we use for importing the chain.
  • We can still generate random key/values to simulate MPT data; since that would be a worst-case anyway.
  • We can fully resolve HashedNodes for real, involving the mentioned disk lookups to load parts of the tree that aren’t in memory.
  • We can compare the “slowdown” or “extra work” for the replay benchmark with and without migration, tunning for different Y values (number of keys migrated per block).
  • This benchmark is quite pessimistic since in theory, we'll walk keys in order, so many key/values will be packed in a few leafs. That can mean that inserting migrated key values can have a reasonable speedup compared to this benchmark, for "batching" reasons.

@jsign jsign force-pushed the jsign/batchedinsertordered branch from 4ec1412 to 0190cce Compare April 19, 2023 15:01
Comment on lines +17 to +19
// BatchNewLeafNode creates a new leaf node from the given data. It optimizes LeafNode creation
// by batching expensive cryptography operations. It returns the LeafNodes sorted by stem.
func BatchNewLeafNode(nodesValues []BatchNewLeafNodeData) []LeafNode {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using here something I did for the full tree conversion, but added parallelization at this layer. This wasn't needed in the tree conversion since, in that case we parallelized "at the client level" by working in subtrees.

But the idea is the same/similar.

panic("stems are equal")
}

func (n *InternalNode) InsertMigratedLeaves(leaves []LeafNode, resolver NodeResolverFn) error {
Copy link
Collaborator Author

@jsign jsign Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sketch/first-version of a method that receives prepared LeafNode of base tree key/values, and attemps to "merge them" in a living VKT. Isn't final and can have rough edges.

Comment on lines +112 to +117
case Empty:
parent.cowChild(ln.stem[parent.depth])
parent.children[ln.stem[parent.depth]] = &ln
ln.setDepth(parent.depth + 1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy case, insert leaf and call it a day.

@jsign jsign force-pushed the jsign/batchedinsertordered branch from e92fc0c to eea2733 Compare April 19, 2023 15:22
parent.cowChild(ln.stem[parent.depth])
parent.children[ln.stem[parent.depth]] = &ln
ln.setDepth(parent.depth + 1)
case *LeafNode:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this case is more interesting. We have two subcases.

Comment on lines +117 to +132
if bytes.Equal(node.stem, ln.stem) {
// In `ln` we have migrated key/values which should be copied to the leaf
// only if there isn't a value there. If there's a value, we skip it since
// our migrated value is stale.
nonPresentValues := make([][]byte, NodeWidth)
for i := range ln.values {
if node.values[i] == nil {
nonPresentValues[i] = ln.values[i]
}
}

node.updateMultipleLeaves(nonPresentValues)
continue
}
Copy link
Collaborator Author

@jsign jsign Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we already have the leaf for the stem, we have to do a sort of "merging" but not blindly.

Only copy values if the current value is nil, which means our (migrated) value isn't stale.
So the idea in L121-L126 is to filter out values that are stale. Then we exploit the method updateMultipleLeaves to update the existing Leaf.

Note that the original LeafNode that we prepared with the migrated values wasted some effort in computing C1 and C2 which we aren't using here. That's fine... the probability of that work being wasted effort is the probability of having this case, which is very low probability. As in, the migrated value coincidentally matched a LeafNode that was touched in the block execution; so it's fine.

[Note for the future: we need some particular test to cover this case. Generating random key/values won't probably cover this. We'll have time for that. Early stages...]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note for the future: we need some particular test to cover this case. Generating random key/values won't probably cover this. We'll have time for that. Early stages...]

you can simply pick a few random keys from leaves and insert them in the tree with a different value, to simulate this case, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

conversion.go Outdated
ln := leaves[i]
parent := n

// Look for the appropiate parent for the leaf node.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wal the tree looking for the parent for the LeafNode to be inserted. While we walk, we have to resolve potential HashedNodes. Note that we carefully (L109) mark our walk as cow-ed since we know we'll insert the leaf "down the road".

Comment on lines +104 to +107
nextParent, ok := parent.children[ln.stem[parent.depth]].(*InternalNode)
if !ok {
break
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we find a LeafNode or Empty, we're done; we have the parent.

Comment on lines +154 to +155
default:
return fmt.Errorf("unexpected node type %T", node)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just be sure nothing else can be found, which would be a bug.

Comment on lines +136 to +139
// We do a sanity check to make sure that the fork point is not before the current depth.
if byte(idx) <= parent.depth {
return fmt.Errorf("unexpected fork point %d for nodes %x and %x", idx, node.stem, ln.stem)
}
Copy link
Collaborator Author

@jsign jsign Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be paranoid and check if this invariant holds; if it doesn't hold, it's a bug.

Comment on lines +140 to +153
// Create the missing internal nodes.
for i := parent.depth + 1; i <= byte(idx); i++ {
nextParent := newInternalNode(parent.depth + 1).(*InternalNode)
parent.cowChild(ln.stem[parent.depth])
parent.children[ln.stem[parent.depth]] = nextParent
parent = nextParent
}
// Add old and new leaf node to the latest created parent.
parent.cowChild(node.stem[parent.depth])
parent.children[node.stem[parent.depth]] = node
node.setDepth(parent.depth + 1)
parent.cowChild(ln.stem[parent.depth])
parent.children[ln.stem[parent.depth]] = &ln
ln.setDepth(parent.depth + 1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the needed internal points depending on the "fork section" of the steam, and connect the final parent with the existing LeafNode and the "to be inserted" LeafNode.

jsign referenced this pull request in jsign/go-ethereum Apr 20, 2023
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Comment on lines +117 to +132
if bytes.Equal(node.stem, ln.stem) {
// In `ln` we have migrated key/values which should be copied to the leaf
// only if there isn't a value there. If there's a value, we skip it since
// our migrated value is stale.
nonPresentValues := make([][]byte, NodeWidth)
for i := range ln.values {
if node.values[i] == nil {
nonPresentValues[i] = ln.values[i]
}
}

node.updateMultipleLeaves(nonPresentValues)
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note for the future: we need some particular test to cover this case. Generating random key/values won't probably cover this. We'll have time for that. Early stages...]

you can simply pick a few random keys from leaves and insert them in the tree with a different value, to simulate this case, no?

parent := n

// Look for the appropiate parent for the leaf node.
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a recursive version of this in the end, but as long as it works fine there's no issue for now.

jsign added 12 commits April 20, 2023 13:55
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign force-pushed the jsign/batchedinsertordered branch 3 times, most recently from 02bf367 to 8102b90 Compare April 20, 2023 19:18
if i >= NodeWidth-1 {
if i >= NodeWidth {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign force-pushed the jsign/batchedinsertordered branch from 8102b90 to fe91282 Compare April 20, 2023 19:21
@jsign jsign marked this pull request as ready for review April 24, 2023 12:25
@jsign jsign requested a review from gballet April 24, 2023 12:30
Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gballet gballet merged commit de802a6 into master Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants