Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

Eth2.0 Implementers Call 4 Agenda #8

Closed
djrtwo opened this issue Sep 18, 2018 · 12 comments
Closed

Eth2.0 Implementers Call 4 Agenda #8

djrtwo opened this issue Sep 18, 2018 · 12 comments

Comments

@djrtwo
Copy link
Contributor

djrtwo commented Sep 18, 2018

Eth2.0 Implementers Call 4 Agenda

Meeting Date/Time: Thursday 2018/9/27 at 14:00 GMT

Meeting Duration 1.5 hours

YouTube Live Stream Link

Agenda

  1. Client Updates
  2. Research Updates
  3. Block processing timing results
  4. libp2p daemon
  5. Testing
  6. Proposal to use separate serialization format for wire vs hashing
  7. Alternative tree storage structures
  8. v2.1 Discussion
  9. Open Discussion/Closing Remarks
@Mikerah
Copy link

Mikerah commented Sep 20, 2018

Can we discuss tests for ssz? Since, ssz is our serialization format (for now), we should have a set of tests that implementers can use to make sure that their implementation of ssz is correct.

@mratsim
Copy link

mratsim commented Sep 21, 2018

@Mikerah, I'm interested in this as well and started an issue with some ideas: ethereum/beacon_chain#115

@raulk
Copy link

raulk commented Sep 24, 2018

@djrtwo This is Raúl from the libp2p team at Protocol Labs. I'm happy to jump on the call to discuss the libp2p daemon we're working on, in case that can be of help to the community.

@djrtwo
Copy link
Contributor Author

djrtwo commented Sep 24, 2018

Thanks @raulk! We'd love to have you. Adding an item to the agenda

@AlexeyAkhunov
Copy link

I am suggesting two topics:

  1. serialisation formats - I suggest to not use SimpleSerialise and instead use two separate formats: one for wire packets, and another for the input to hash functions. they have somewhat conflicting requirements and the mistake in the past was to use RLP for both. I would still recommend RLP for wire packet encoding. But for inputs to the hash functions, I would take something that could be streamed, i.e. no length prefixes, so that hashing (especially hashing of large trees) could be done very efficiently.

  2. I have been researching various tree structures as an alternative to Patricia Merkle tree in Ethereum. Currently the idea is to use Sparse Merkle Tree. They are good for most things. There is only one wrinkle - in adversarial settings, in order to keep the length of the proofs short, one needs to randomise (i.e. hash) keys before inserting them into the tree. This is the approach that has been taken in Ethereum so far. And it has some unfortunate side effects - performance hit on hashing the keys on all tree operations, necessity to keep around a database of preimages, inability to efficiently iterate in the natural order of the keys, only their hashes. So currently I am revisiting IAVL trees used in Terndemint. They are based on AVL+ tree and self-balancing. For example, for the tree of 3 million nodes, the range of possible heights is from 22 to 30. I thought that it would not be possible to combine IAVL trees with the straightforward database layout that I use in Turbo-Geth, but now I think I found a way to do it, and will be starting a Proof Of Concept shortly. Perhaps IAVL trees could be used instead of Sparse Merkle trees if that works.

@djrtwo
Copy link
Contributor Author

djrtwo commented Sep 26, 2018

Thanks @AlexeyAkhunov. Adding both

@vbuterin
Copy link

So currently I am revisiting IAVL trees used in Terndemint. They are based on AVL+ tree and self-balancing. For example, for the tree of 3 million nodes, the range of possible heights is from 22 to 30. I thought that it would not be possible to combine IAVL trees with the straightforward database layout that I use in Turbo-Geth, but now I think I found a way to do it, and will be starting a Proof Of Concept shortly. Perhaps IAVL trees could be used instead of Sparse Merkle trees if that works.

The reason AVL-style trees were not directly used in ethereum originally, aside from the fact that they are even more complex than Patricia trees and were beyond my understanding at the time I was making the spec, is that they do not have the property that every combination of keys and values has a single root hash, which makes all sorts of caching optimizations much more risky, as you have to remember what order to do updates in. Even things like warp sync formats would need to include the extra information.

That said, one thing I think we could look into is using AVL trees on top of sparse Merkle trees. The key insight here is that we can use the sparse Merkle tree as the hashing format, as it's by far the simplest and most natural option there is, but then we can use all sorts of constructions for batching sparse Merkle tree nodes together in the DB, bounded-depth trees being one option. The nice thing about this approach is that we don't need to specify it at consensus layer; different clients can batch nodes in different ways.

@nisdas
Copy link

nisdas commented Sep 27, 2018

@AlexeyAkhunov For wire packets wouldn't it be better to stick with Simple Serialize ? Using RLP would lead to a pretty big performance hit compared to other serialization solutions

@AlexeyAkhunov
Copy link

AlexeyAkhunov commented Sep 27, 2018

@nisdas The main drawback of Simple Serialise is that it requires schema information for deserialising. that makes de-serialisation less efficient, and precludes a lot of generic tooling, like packet inspectors and rooters

@AlexeyAkhunov
Copy link

they do not have the property that every combination of keys and values has a single root hash, which makes all sorts of caching optimizations much more risky, as you have to remember what order to do updates in. Even things like warp sync formats would need to include the extra information.

Yes, this is exactly what I am working on. I quantified that extra information, and it turns out to be 1.5 extra bit per key for "regular" AVL+ tree, and around 0.8 bit per key for what I call "left leaning" (or right leaning if you like) AVL+ tree. I want to do Proof Of Concept on how to store and update those structures efficiently

@cdetrio
Copy link
Member

cdetrio commented Sep 27, 2018

I'd like to share a 3d visualization of beacon chain and shard blocks: https://beta.observablehq.com/@cdetrio/shasper-viz-0-4

@zah
Copy link
Contributor

zah commented Oct 5, 2018

The Merkle trie discussion got me thinking and I have a potentially silly question now.

What prevents us from storing the state in a regular hash table governed by some a deterministic algorithm? This would make all look-ups fast, but the obvious problem is how to provide the merkle proofs. One solution would be build a binary Merkle tree over the slots of the hash table (i.e. you hash slot 0 and 1 together; then you combine the result of slots 0-1 and slots 2-3; then this result with the hash of slots 4-8 and so on). Under this scheme, given a current hash table size and a particular key, you'll be able to efficiently verify a compact Merkle proof.

Some potential problems and benefits:

  • problem: Resizing the hash table will require a linear pass over the entire state. When this happens can be predicted only approximately - certain rare blocks will trigger a long state reorganization. There are some designs that try to deal with this issue, although it may be complicated to combine them with the Merkle scheme:

    https://en.wikipedia.org/wiki/Hash_table#Alternatives_to_all-at-once_rehashing

  • benefit: All state modifications are fast and they only mark certain slots as dirty. At the end of the computation, you can traverse only the dirty branches of the tree to compute the final root hash once. This can spare a lot of intermediate recalculations of the root hash, although one must acknowledge that the intermediate values are useful/required for schemes such as TrueBit.

  • benefit: State sync becomes a simple streaming of the contents of the hash table. This can be highly efficient.

  • problem: Only the latest state is available. The current Merkle tries are a persistent data structure that allows you to access multiple alternative state positions seamlessly. The hash table isn't one.

  • problem: To avoid the issue mentioned by Vitalik, the hash table should probably use a design with sorted buckets. We can make it harder for an attacker to overload a certain bucket by passing all keys through a cryptographic hash function, but this may still be insufficient. We can use a binary Patricia trie as the contents of each bucket, but this will make the Merkle proofs more awkward.

This is not a small list of problems, but are there any others?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants