Merkle Mountain Range for efficient Grandpa ancestry proofs #263

tomusdrw · 2020-08-04T15:50:28Z

Imagine a chain like this:

...   X           // Best finalized chain known by the (on-chain) light client
... 4 5 6 7 8 9 A // Block number 5-10
... - F - - - - F // F means the header is finalized (we import with justification)

Why?

Light client implementations have a hard time to decide if headers 6-9 should be imported if header 10 (0xA) has not yet been seen.

Our current approach (Solidity for PoA, most likely upcoming substrate light client) is to simply import headers as we see them, verifying only their parent-hash to make sure they extend the right (finalized) fork and mark them finalized at the very end, when we see justification data for block number 10.

This approach has some drawbacks:

tracking multiple forks is expensive, but necessary (since we don't even verify block authorship signatures)
some pruning is required (or marking individual blocks as finalized)
each and every block has to be imported, and they all need to be imported sequentially, which is expensive.
super-fast chains are going to be hard to bridge due to amount of required header-chain data

For many chains, where transactions costs (both computation and storage) are high, this approach might be suboptimal.

How?

The idea is to be able to import header number 10 directly, without requiring 6-9 to be imported first (or at all). While we could in theory simply accept header 10 if it's signed correctly by current validator set, we might run into two issues:

For some applications data from headers 6-9 (or at least header hashes) might be required (for instance to prove transaction extistence)
Block 10 might be built on a different fork than block 5, so we must verify block 10's ancestry.

In Frame-based substrate runtimes, frame_system pallet is actually storing block hashes of recent blocks, so it might be possible to use this data to prove ancestry (you simply present a storage proof at block 10), but this data is pruned (MaxBlockHashes) and while in theory if finality data was on-chain as well, we could extend this period, there are more efficient ways of doing it, namely:
paritytech/substrate#2053

Or even better Merkle Mountain Ranges: paritytech/substrate#3722

Example 1: Merkle Mountain Ranges (MMR)
#2053 , https://github.com/mimblewimble/grin/blob/master/doc/mmr.md
For many kinds of auxiliary blockchain protocols, it's important to be able to prove that some ancient block header is an ancestor of the finalized chain head. MMRs provide a good way of doing that.
We want to write a runtime module to keep track of the peaks (roots) of a bunch of different merkle tries - there will be log2(N) of these for N blocks (and N trie nodes in total). You can add to the MMR with only the peaks, and prove ancestry if you have all the nodes.
We'd want full nodes to keep track of all of the MMR nodes by keeping them in offchain storage. However, if even one block in the chain is not executed, it is possible to end up in a situation where ancestry can no longer be proven.

Details

We should extend frame_system to store MMR peaks and use Indexing API to write all the nodes to the Offchain Database.
Every node with indexing enabled will then be able to construct MMR proofs of ancestry that can be efficiently verified against the on-chain data.
MMR offchain data can also be re-constructed from the header chain (perhaps within an Offchain Worker), but reconstructing the structure on-demand is not feasible (it's a O(n) process).

So to circle back to our example, the light client could import header 10 directly, but would require an extra proof data:

Storage proof of MMR peaks
MMR proof of 5 being an ancestor of 10.

Also any application built on top of the header chain could verify that say header 7 is part of the chain by providing exactly the same proofs.

This issue needs to be implemented in Substrate repo.

The text was updated successfully, but these errors were encountered:

AlistairStewart · 2020-08-05T12:51:12Z

This will intersect heavily with the changes we are planning for the Ethereum bridge. We are thinking of having validators do BLS signatures outside of GRANDPA on finalised blocks. This would avoid the need for simplifying GRANDPA justifications, but not the case of working backwards from one thing signed by the validators.

So maybe the validators should be signing the MMR roots with BLS.

For Polkadot and Kusama, we also want a cheap way of proving that a parachain block happened directly, which might mean using more than relay chain block hashes in something like this. That would really help bridges on parachains.

tomusdrw · 2020-08-06T13:28:59Z

Extensions on top of this that we would like to see:

MMR built not only from recent block_hashes, but additional arbitrary data, basically in the leaves we want: (BlockHash, ExtraDataForThatBlock), where this additional data is configurable by runtime developer. In case of Polkadot Relay chain this would most likely be: (BlockHash, Vec<(ParaId, ParaHeadData)>, so that parachains are able to push some data there as well (as a mean to communicate outside of Polkadot with bridged chains).

Rough pseudo-code:

trait MMR {
 type LeafData: Encode;
 const IS_ENABLED: bool;
 fn mmr_leaf_data_for_current_block() -> Self::LeafData;
}

struct ParachainMmr;
impl MMR for ParachainMmr {
  type LeafData = (ParachainId, ParachainHeadData);
  const IS_ENABLED = true;
  fn mmr_leaf_data_for_current_block() -> Self::LeafData { todo!() }
}

impl frame_system::Trait for PolkadotRuntime {
  type MMR = ParachainMmr;
}

Ideally the hash function should be configurable to make MMR proofs easy to verify on chains with limited support (Think: Keccak256 or Blake2).

The MMR peaks of current block should be easily queryable (RuntimeApi), then they can be retrieved by Grandpa Validators, who would collect signatures for last finalized MMR root hash (see proposed algorithm of choosing a block below). As a result we would get:
(MmrRootHashOfSomeFinalizedBlock; Signature(s)FromCurrentValidatorSet)

The crypto for signature is not yet decided yet, ideally if this was something that is cheap to verify on Ethereum and supports signature aggregation. Perhaps BLS over some eth-supported curve.

Open question remains how frequently such commitments should be prepared and how validators should agree on a single block to collect the signatures on. One option would be to take a block:

block_to_sign_on = last_block_with_signed_mmr_root + NextPowerOfTwo((last_finalized_block - last_block_with_signed_mmr_root) / 2)

These signed commitments from (2) can then be used to:

Feed back on Substrate chain (easily queryable finality proofs available on-chain)
Create an efficient Grandpa Light Client to be used for bridges.

Potentially useful:
https://github.com/darwinia-network/merkle-mountain-range
https://github.com/darwinia-network/darwinia-common/blob/master/frame/header-mmr/src/lib.rs

tomusdrw · 2020-12-09T15:40:31Z

Still todo in substrate repo:

Runtime API and Custom RPC to generate leaf proofs.
Pruning of non-peak nodes from on-chain storage
Customization of MMR growth cadence (currently we do it on every block, for some cases it might be enough to grow once per session).

boundless-forest · 2020-12-17T03:14:48Z

Hi, @tomusdrw I have noticed the Merkle Mountain Range pallet(paritytech/substrate#7312) is accepted in the substrate. Based upon this, Is it possible to integrate MMR to sub-sub bridge in parity-bridges-common design to avoid relaying every block from source chain to target chain?

tomusdrw · 2020-12-18T13:40:30Z

@AsceticBear hey! Yes, we have a plan to create a different Sub<>Sub bridge pallet and use MMR + BEEFY actually instead of GRANDPA.
It's also possible to do some optimizations with MMR + GRANDPA, for instance you can import a justification and MMR Proof that shows that the justification has ancestry that is the latest known finalized block of the pallet (i.e. you skip importing headers completely).
That said both of these things are not being actively worked on yet, we hope to start work on this kind of optimizations in Q1 of 2021.

boundless-forest · 2020-12-20T13:38:31Z

@tomusdrw We have a plan to do this MMR optimization based on current sub<>sub bridge in this repo like what you said above. However, a problem we encountered is how do we know the block with the flag signal_hash without relaying every header from source chain to target chain. In another word, In the target substrate client, we maintain the source chain authority set of GRANDPA to verify justification. When the relayer transfer block continuously one by one, the target chain client can get the authority set change timely. But, how do we achieve this while the relayer skips importing the header from source chain to target chain one by one?

tomusdrw · 2021-01-04T13:50:36Z

@AsceticBear sorry for the late answer. Please take a look at BEEFY protocol:
https://github.com/paritytech/grandpa-bridge-gadget
It is designed specifically to work with MMR to make it easy to implement such low bandwidth bridge.

Currently it's using secp256k1 signatures for Ethereum compatibility, but the plan is to have BEEFY authorities create an aggregated signature (e.g. BLS) for even faster verification.

If I understood your question correctly, with GRANDPA + MMR your client might need to have the session length hardcoded, so that with every set transition it requires the header with signal_hash and a proof that it's part of current MMR. Obviously the block you import should be signed by the old authority set to enact the transition.

tomusdrw · 2021-04-14T13:00:42Z

Closing, since:

MMR implementation has landed to substrate & polkadot
BEEFY is being worked on in https://github.com/paritytech/grandpa-bridge-gadget
We now have pallet-bridge-grandpa (Finality Verifier Pallet #628) and we don't really verify ancestry cause it does not add any value in our case.

* Upgdate to latest polkadot & substrate * Fix code formatting (cargo fmt) * Fix unit tests

tomusdrw added A-feat New feature or request B-Substrate to Substrate P-Runtime labels Aug 4, 2020

tomusdrw added this to the Millau Release milestone Aug 4, 2020

tomusdrw self-assigned this Aug 20, 2020

tomusdrw mentioned this issue Aug 28, 2020

Grandpa Super Light Client in Solidity #323

Closed

tomusdrw added the B-Substrate to ETH label Aug 28, 2020

tomusdrw mentioned this issue Oct 13, 2020

Merkle Mountain Range pallet paritytech/substrate#7312

Merged

5 tasks

tomusdrw modified the milestones: Millau Release, Nice to Have, Brooklyn Release Nov 10, 2020

tomusdrw added B-ETH to Substrate and removed B-Substrate to ETH labels Dec 1, 2020

hackfisher mentioned this issue Dec 21, 2020

Header Relay Proof for substrate-substrate bridge darwinia-network/darwinia-common#408

Closed

tomusdrw mentioned this issue Jan 8, 2021

Finality Verifier Pallet #628

Closed

tomusdrw mentioned this issue Jan 29, 2021

Unordered Ancestry Proofs Cause Havok During Header Import #683

Closed

tomusdrw closed this as completed Apr 14, 2021

svyatonik pushed a commit that referenced this issue Jul 17, 2023

Upgdate to latest polkadot & substrate (#263)

ea10fa8

* Upgdate to latest polkadot & substrate * Fix code formatting (cargo fmt) * Fix unit tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merkle Mountain Range for efficient Grandpa ancestry proofs #263

Merkle Mountain Range for efficient Grandpa ancestry proofs #263

tomusdrw commented Aug 4, 2020 •

edited

Loading

AlistairStewart commented Aug 5, 2020

tomusdrw commented Aug 6, 2020 •

edited

Loading

tomusdrw commented Dec 9, 2020

boundless-forest commented Dec 17, 2020

tomusdrw commented Dec 18, 2020

boundless-forest commented Dec 20, 2020

tomusdrw commented Jan 4, 2021

tomusdrw commented Apr 14, 2021

Merkle Mountain Range for efficient Grandpa ancestry proofs #263

Merkle Mountain Range for efficient Grandpa ancestry proofs #263

Comments

tomusdrw commented Aug 4, 2020 • edited Loading

Why?

How?

Details

AlistairStewart commented Aug 5, 2020

tomusdrw commented Aug 6, 2020 • edited Loading

tomusdrw commented Dec 9, 2020

boundless-forest commented Dec 17, 2020

tomusdrw commented Dec 18, 2020

boundless-forest commented Dec 20, 2020

tomusdrw commented Jan 4, 2021

tomusdrw commented Apr 14, 2021

tomusdrw commented Aug 4, 2020 •

edited

Loading

tomusdrw commented Aug 6, 2020 •

edited

Loading