Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash tree memoization #39

Open
jannikluhn opened this issue Feb 9, 2019 · 2 comments
Open

Hash tree memoization #39

jannikluhn opened this issue Feb 9, 2019 · 2 comments

Comments

@jannikluhn
Copy link
Contributor

Problem

The tree hashing algorithm describes a hash function for SSZ data structures. For lists and containers, it is defined recursively: First, hash each element and, second, hash the results together. Thus, updating only a single element of a list or a field in a container does not require a full recalculation of the top level tree hash. We should take advantage of this to save unnecessary computation that can take quite long for big values (in particular, the beacon chain state). At the same time, we'd like our data structures to stay immutable.

Vitalik came up with a (mutable) example implementation here: https://github.com/ethereum/research/blob/2d3ed6e42087d5b14cdf107c897e8d3e5db3ee7a/ssz_hashable_list/hashable_list.py

Proposed Implementation

Container

ssz.Serializable should memoize its tree hash. ssz.copy should copy the tree hash as well.

List

We need a new class (maybe named, as Vitalik proposed it, HashableList) that similar to ssz.Serializable memoizes its tree hash, but in addition to that the full hash tree. It should be an immutable sequence (i.e. define __getitem__, __len__, __reversed__, __contains__, and __iter__), and have a copy method with the following interface:

def copy(
    replace: Dict[int: TElement] = None,
    insert: Dict[int: TElement] = None,
    remove: Dict[int: TElement] = None,
    prepend: Sequence[TElement] = None,
    append: Sequence[TElement] = None,
) -> HashableList[TElement]:
    ...

The new list will copy the elements of the old one, but replace/insert/remove/prepend/append some elements depending on the given parameters. The old hash tree should be copied as well, updating the necessary branches only. For safety, we should probably either only allow a single argument or enforce a certain order of the arguments so that its clear to what the indices refer to.

Usage example

state.copy(
    x=4,
    ys=state.ys.copy(
        replace(4, b"foo"),
    ),
)
@hwwhww
Copy link
Contributor

hwwhww commented Feb 11, 2019

Thanks for the write-up! I think it's the right way to rescue us from the immutable copy for huge data.
/cc @pipermerriam, basically we will have trouble when serialize and tree hash the BeaconState with 4M validators. The benefit of tree hashing has been discussed here and here, and we have to find a way to build the tree structure inside SSZ.

@pipermerriam
Copy link
Member

I'm aware of this problem space and I think @jannikluhn is correct that we likely need a formal API to be able to take advantage of this. I don't have opinions about specific implementation but will be happy to review one once it's ready or to weigh in on concepts for the API.

@jannikluhn jannikluhn mentioned this issue Feb 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants