Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Snapshot to bintrie #12

Closed
wants to merge 24 commits into from

Conversation

gballet
Copy link

@gballet gballet commented Mar 23, 2020

Quick and Dirty prototype to build a binary trie from the snapshot, aimed at producing initial data for [1]. It doesn't do any caching, it doesn't do any parallelism, doesn't try to save memory and it stores branches very inefficiently.

@holiman it's also based on a pre-rebase version of trie_gen, I'll rebase if it helps/makes sense.

Refs

  1. https://ethresear.ch/t/overlay-method-for-hex-bin-tree-conversion/7104/3

TODO

  • rebase
  • pruning
  • extensions
  • parallelize generation and db writes
  • rework storage format
  • benchmark tests

Running

This adds a bintrie subcommand to geth that takes the current snapshot and performs the conversion based on that, so conversion can be started with:

geth --snapshot bintrie

@gballet gballet requested a review from holiman as a code owner March 23, 2020 15:20
@holiman
Copy link
Owner

holiman commented Mar 23, 2020 via email

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments (yeah I know it's work in progress and you haven't optimized yet, but I was eager to take a peek)

trie/binary.go Outdated Show resolved Hide resolved
trie/binary.go Outdated Show resolved Hide resolved
trie/binary.go Outdated Show resolved Hide resolved
trie/binary.go Outdated
Comment on lines 144 to 149
data[0] = byte(len(payload[0]))
copy(data[1:], payload[0])
data[len(payload[0])+1] = byte(len(payload[1]))
copy(data[2+len(payload[0]):], payload[1])
data[len(payload[0])+len(payload[1])+2] = byte(len(payload[2]))
copy(data[2+len(payload[0])+len(payload[1]):], payload[2])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this format defined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pure flight of fancy for the moment, I wanted to see what it would take to convert the trie

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the format to be closer to what the hex prefix does.

trie/binary.go Outdated
return err
}

func (t *BinaryTrie) insert(depth int, key, value []byte) error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought: I wonder if an iterative insert would be faster than a recursive one. I wouldn't mind giving it a go, once we have some benchmark-tests for it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure 👍

trie/binary.go Outdated Show resolved Hide resolved
trie/stacktrie.go Outdated Show resolved Hide resolved
gballet and others added 2 commits April 1, 2020 13:23
Co-Authored-By: Martin Holst Swende <martin@swende.se>
Co-Authored-By: Martin Holst Swende <martin@swende.se>
@holiman
Copy link
Owner

holiman commented Apr 1, 2020

Do you want me to merge this or want to tick those boxes first?

@gballet
Copy link
Author

gballet commented Apr 1, 2020

It would be better to tick the boxes first. It shouldn't make it to mainnet until I fixed the OOM that just happened on mon06 😁

@gballet
Copy link
Author

gballet commented Apr 21, 2020

After using node extensions:

INFO [04-21|11:04:47.625] Loaded most recent local fast block      number=9689125 hash=d07200��199482 td=14582575985188855549455 age=1mo4d22h
INFO [04-21|11:04:47.626] Generating binary trie                   root=431f3f��6e8372
INFO [04-21|11:04:47.626] Allocated cache and file handles         database=/datadrive/geth/bintrie                cache=128.00MiB handles=1024
INFO [04-21|11:33:17.462] Inserted all leaves                      count=80438713
INFO [04-21|11:33:17.463] Done writing nodes to the DB             count=241316137
INFO [04-21|11:33:17.463] Calculated binary hash                   hash=0x28b39d8423de6c1b8bcf80a3fb66b2139b931c9a61ee465c024a5f70c496d286
INFO [04-21|11:33:17.463] Generation done                          root=431f3f��6e8372 binary root=28b39d��96d286

That's roughly 30 minutes for the account trie conversion, which is a nice improvement.

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I havent' really dived into the binary trie implementation, to be honest.

return fmt.Errorf("Could not create iterator for root %x: %v", root, err)
}
log.Info("Generating binary trie", "root", root)
generatedRoot := snapshot.GenerateBinaryTree(ctx.GlobalString(utils.DataDirFlag.Name), it)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that resolve itself correctly? Like, what if --goerli is specified, is the GlobalString (utils.DataDir... really correct?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it above, doing log.Info, maybe add

dbPath := ctx.GlobalString(utils.DataDirFlag.Name)
log.Info("Generating binary trie", "root", root, "database", dbPath)
generatedRoot := snapshot.GenerateBinaryTree(dbPath, it)

func GenerateBinaryTree(path string, it AccountIterator) common.Hash {
db, err := rawdb.NewLevelDBDatabase(path+"/bintrie", 128, 1024, "")
if err != nil {
panic(fmt.Sprintf("error opening bintrie db, err=%v", err))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, please don't . You have a live iterator on the original trie db, please back out carefully and close it nicely

defer wg.Done()
for kv := range btrie.CommitCh {
nodeCount++
db.Put(kv.Key, kv.Value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might go a bit faster if you use db batches.. probably a whole lot faster, actually, since the data you put in there is pretty tiny

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you depend on the commit being done synchronously. If you don't require it to be synchronous, you could do two things:

  1. Do batching here,
  2. Add a buffer to the CommitCh. Sometimes it will lag behind, but other times the db insert may be fast (when it only puts it into batch), so a reasonably sized buffer may be good here.

@gballet
Copy link
Author

gballet commented Oct 11, 2021

Verkle trees present a promising approach. Closing.

@gballet gballet closed this Oct 11, 2021
@ASISBusiness
Copy link

Quick and Dirty prototype to build a binary trie from the snapshot, aimed at producing initial data for [1]. It doesn't do any caching, it doesn't do any parallelism, doesn't try to save memory and it stores branches very inefficiently.

@holiman it's also based on a pre-rebase version of trie_gen, I'll rebase if it helps/makes sense.

Refs

  1. https://ethresear.ch/t/overlay-method-for-hex-bin-tree-conversion/7104/3

TODO

  • rebase

  • pruning

  • extensions

  • parallelize generation and db writes

  • rework storage format

  • benchmark tests

Running

This adds a bintrie subcommand to geth that takes the current snapshot and performs the conversion based on that, so conversion can be started with:


geth --snapshot bintrie

holiman pushed a commit that referenced this pull request Nov 26, 2022
Configures the genesis and network parameters of the protodanksharding devnet.
holiman pushed a commit that referenced this pull request Jul 7, 2023
holiman pushed a commit that referenced this pull request Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants