core/state/snapshot: reuse memory data instead of hitting disk when generating #22667

holiman · 2021-04-14T20:46:48Z

This PR is a successor to rjl493456442#6 (some charts can be found there).

This PR attempts to minimize the number of disk reads when doing the snapshot generation.

When we have a slice of snapshot values which do not match the trie data, we currently iterate the disk to retrieve the canonical data.

What this PR does, is commit the (incorrect) snapshot data to a trie, which will be 99% correct. When iterating the trie, we then use the snapshot-trie-database for resolving hashes. By doing so, we can read 99% of the leaves the from the memory db instead of resolving from disk.

It can also be implemented differently (this was my third variation). Other variants which also "work"

Use a differenceIterator, to walk only the nodes which are on disk and not in memory. The drawback with this (otherwise pretty clean) approach is that it needs two walks: one to find out what needs to be added from trie to snapshot, another (reverse direction) which figures out what needs to be deleted. It could be solved by writing a new difference-iterator which spits out "addition" or "deletion", if we prefer that approach.
Use a relay-database, which implements ethdb.KeyValueStore. And then shove all nodes into a memorydb, and have the relaydb use the memdb as primary and diskdb as secondary, and plug it into a new trie.Database as backend. It's pretty clunky, and does not get to use any of the cleans cache in the existing trie.Database.

The upside of this particular way to implement it is that most of the modifications are performed in the node iterator, and does not touch the trie database.

This approach can probably be further polished/optimized if we retain the trie used by the prover, instead of rebuilding it once again -- but this PR focuses on the disk access part, not so much on getting the cpu cycles down

karalabe · 2021-04-19T13:13:18Z

If I understand this PR correctly, you are adding a new cache layer into the iterator, so that while traversing the trie, if anything is in memory already (in your bastard-trie) then we just read that, but otherwise consider it the same as if it were on disk already?

holiman · 2021-04-19T13:47:13Z

That is correct. We just resolve from a primary source (which is fully in memory), and if that fails, resolve from the secondary source (the disk trie).

karalabe

I've pushed the minor change on top which we've discussed to use ethdb.KeyValueStore instead of a trie.Database, since the former is easy to construct (potentially useful elsewhere too) whereas the latter requires key/value data, can't use just a random soup of trie nodes.

The PR is a bit more complicated than I'd like, but an essential detail is that all simpler variations placed the memcache below the live trie.Database, which is problematic as we really don't want potentially non-persisted data to end up in the trie.Database clen caches.

…enerating

jon-chuang · 2021-04-28T04:03:51Z

Why isn't generate concurrent, even though we are using a lot of async IO?

From what I understand, the purpose of using generate range is not to build the Trie for the entire state, which would OOM. However, how does one decide to delete Trie nodes if using the partially-built Trie as memcache? Is there a pruning mechanism to do so easily?

Further, I am writing the following tool: Async Trie resolver.

The model is that one submits keys to fetch from asynctrieresolver. The resolver fetches data async and stores in a resolvedNodeCache map. Once a key is available, a channel signal for that key is invoked and a new Trie can be built sequentially with assistance from the resolvedNodeCache map. The resolver adds the resolved nodes to a map.

Notice that since async IO is always the bottleneck when it comes to Trie operations, if one simply requires eventual consistency, we can use our AsyncTrie, where one just calls trie.sync() and it would block on all pending insert, delete, get operations to complete. The limitation is that calling sync frequently can lead to low performance.

Further, maintaining a sequential consistency in the order in which the operations are called is also a difficulty (but this can be resolved with an applyOperation queue that is sequential).

The guarantees are: 1) thread safety on copy (regardless if copy calls trie.sync() first, or simply copies the Trie with updates yet unapplied). 2) sequential consistency.

A programming example is as follows:

func(trie *AsyncTrie) (..) {
    for .. := range .. {
        trie.Insert(..)
        trie.Delete(..)
   }
    valuefuture := trie.Get(..) 
    for .. := range .. {
        trie.Insert(..)
        trie.Delete(..)
   }

    value, err := <- valuefuture
   
    if err == nil {
        changeVaue(value)
        trie.Insert(value, ..)
    }
    trie.Insert(..)

    trie.sync() 
    return trie.root
}

Whenever trie.sync() is called, the resolvedNodeCache can be cleared since all resolved nodes would be updated in the trie at the end of sync, when all node resolver workers have been terminated.

In the concurrent tryGet rather than trie building model, as is the case in snapshot generation, one can skip building the Trie altogether (past a certain depth) and go straight to using a resolvedNodeCache. This has the advantage of not blocking on past tryGet operations which are sequential.

Question: Does this really work for Insert and Delete, since they would update the Trie itself, and change how it is walked?

Yes, since walking for hash resolution is stable under mutation.

A cached resolved node is "live" for two purposes: to update the in-memory Trie, and to cache intermediate paths for async resolver workers to walk the Trie.

The sequential update thread only queries for a HashNode if it does not yet live in memory. If it does not live in memory, then it did not live along any path of mutation. Thus, whenever this node is queried by the sequential updater, it is the correct yet unmodified version.

One can strictly enforce this by having only the updater thread mark an isRetrieved for the node variable living in a map. The AsyncTrie should panic if a node is attempted to be retrieved more than once.

In essence, one can replace the prefetcher entirely using this async approach. One just calls trie.Get(..) and then drop the future (via the gc, or manually if you want). The AsyncTrie will build in the background. Then, when wants to compute the root, just call trie.sync() and we are done.

If the order in which applying operations do not matter, (unfortunately for us it most of the time does).

…hen generating (ethereum#22667) * core/state/snapshot: reuse memory data instead of hitting disk when generating * trie: minor nitpicks wrt the resolver optimization * core/state/snapshot, trie: use key/value store for resolver * trie: fix linter Co-authored-by: Péter Szilágyi <peterke@gmail.com>

holiman requested review from karalabe and rjl493456442 as code owners April 14, 2021 20:46

holiman mentioned this pull request Apr 14, 2021

core/state/snapshot: less disk reads rjl493456442/go-ethereum#6

Closed

karalabe mentioned this pull request Apr 20, 2021

core/state/snapshot: reuse memory data instead of hitting disk #22700

Closed

karalabe added this to the 1.10.3 milestone Apr 23, 2021

karalabe approved these changes Apr 23, 2021

View reviewed changes

holiman and others added 3 commits April 23, 2021 11:18

core/state/snapshot: reuse memory data instead of hitting disk when g…

a8facbf

…enerating

trie: minor nitpicks wrt the resolver optimization

4897e10

core/state/snapshot, trie: use key/value store for resolver

1acfe92

holiman force-pushed the lessreads branch from c977529 to 1acfe92 Compare April 23, 2021 09:20

trie: fix linter

b32edf9

karalabe merged commit 49281ab into ethereum:master Apr 23, 2021

rjl493456442 mentioned this pull request Apr 26, 2021

How much storage does the snapshot have？ #22731

Closed

jon-chuang mentioned this pull request May 3, 2021

trie: AsyncTrie #22798

Closed

quorumbot mentioned this pull request Jul 14, 2022

[Upgrade] Go-Ethereum release v1.10.3 Consensys/quorum#1449

Closed

9 tasks

quorumbot mentioned this pull request Aug 5, 2022

[Upgrade] Go-Ethereum release v1.10.3 Consensys/quorum#1469

Merged

9 tasks

This was referenced Sep 23, 2022

Metadium to master METADIUM/go-metadium#24

Closed

Metadium to master METADIUM/go-metadium#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/state/snapshot: reuse memory data instead of hitting disk when generating #22667

core/state/snapshot: reuse memory data instead of hitting disk when generating #22667

holiman commented Apr 14, 2021

karalabe commented Apr 19, 2021

holiman commented Apr 19, 2021

karalabe left a comment

jon-chuang commented Apr 28, 2021 •

edited

Loading

core/state/snapshot: reuse memory data instead of hitting disk when generating #22667

core/state/snapshot: reuse memory data instead of hitting disk when generating #22667

Conversation

holiman commented Apr 14, 2021

karalabe commented Apr 19, 2021

holiman commented Apr 19, 2021

karalabe left a comment

Choose a reason for hiding this comment

jon-chuang commented Apr 28, 2021 • edited Loading

jon-chuang commented Apr 28, 2021 •

edited

Loading