Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle Mountain Range for efficient Grandpa ancestry proofs #263

Closed
tomusdrw opened this issue Aug 4, 2020 · 8 comments
Closed

Merkle Mountain Range for efficient Grandpa ancestry proofs #263

tomusdrw opened this issue Aug 4, 2020 · 8 comments
Assignees
Labels
A-feat New feature or request P-Runtime

Comments

@tomusdrw
Copy link
Contributor

tomusdrw commented Aug 4, 2020

Imagine a chain like this:

...   X           // Best finalized chain known by the (on-chain) light client
... 4 5 6 7 8 9 A // Block number 5-10
... - F - - - - F // F means the header is finalized (we import with justification)

Why?

Light client implementations have a hard time to decide if headers 6-9 should be imported if header 10 (0xA) has not yet been seen.

Our current approach (Solidity for PoA, most likely upcoming substrate light client) is to simply import headers as we see them, verifying only their parent-hash to make sure they extend the right (finalized) fork and mark them finalized at the very end, when we see justification data for block number 10.

This approach has some drawbacks:

  • tracking multiple forks is expensive, but necessary (since we don't even verify block authorship signatures)
  • some pruning is required (or marking individual blocks as finalized)
  • each and every block has to be imported, and they all need to be imported sequentially, which is expensive.
  • super-fast chains are going to be hard to bridge due to amount of required header-chain data

For many chains, where transactions costs (both computation and storage) are high, this approach might be suboptimal.

How?

The idea is to be able to import header number 10 directly, without requiring 6-9 to be imported first (or at all). While we could in theory simply accept header 10 if it's signed correctly by current validator set, we might run into two issues:

  1. For some applications data from headers 6-9 (or at least header hashes) might be required (for instance to prove transaction extistence)
  2. Block 10 might be built on a different fork than block 5, so we must verify block 10's ancestry.

In Frame-based substrate runtimes, frame_system pallet is actually storing block hashes of recent blocks, so it might be possible to use this data to prove ancestry (you simply present a storage proof at block 10), but this data is pruned (MaxBlockHashes) and while in theory if finality data was on-chain as well, we could extend this period, there are more efficient ways of doing it, namely:
paritytech/substrate#2053

Or even better Merkle Mountain Ranges: paritytech/substrate#3722

Example 1: Merkle Mountain Ranges (MMR)
#2053 , https://github.com/mimblewimble/grin/blob/master/doc/mmr.md
For many kinds of auxiliary blockchain protocols, it's important to be able to prove that some ancient block header is an ancestor of the finalized chain head. MMRs provide a good way of doing that.
We want to write a runtime module to keep track of the peaks (roots) of a bunch of different merkle tries - there will be log2(N) of these for N blocks (and N trie nodes in total). You can add to the MMR with only the peaks, and prove ancestry if you have all the nodes.
We'd want full nodes to keep track of all of the MMR nodes by keeping them in offchain storage. However, if even one block in the chain is not executed, it is possible to end up in a situation where ancestry can no longer be proven.

Details

We should extend frame_system to store MMR peaks and use Indexing API to write all the nodes to the Offchain Database.
Every node with indexing enabled will then be able to construct MMR proofs of ancestry that can be efficiently verified against the on-chain data.
MMR offchain data can also be re-constructed from the header chain (perhaps within an Offchain Worker), but reconstructing the structure on-demand is not feasible (it's a O(n) process).

So to circle back to our example, the light client could import header 10 directly, but would require an extra proof data:

  1. Storage proof of MMR peaks
  2. MMR proof of 5 being an ancestor of 10.

Also any application built on top of the header chain could verify that say header 7 is part of the chain by providing exactly the same proofs.

This issue needs to be implemented in Substrate repo.

@tomusdrw tomusdrw added this to the Millau Release milestone Aug 4, 2020
@AlistairStewart
Copy link

This will intersect heavily with the changes we are planning for the Ethereum bridge. We are thinking of having validators do BLS signatures outside of GRANDPA on finalised blocks. This would avoid the need for simplifying GRANDPA justifications, but not the case of working backwards from one thing signed by the validators.

So maybe the validators should be signing the MMR roots with BLS.

For Polkadot and Kusama, we also want a cheap way of proving that a parachain block happened directly, which might mean using more than relay chain block hashes in something like this. That would really help bridges on parachains.

@tomusdrw
Copy link
Contributor Author

tomusdrw commented Aug 6, 2020

Extensions on top of this that we would like to see:

  1. MMR built not only from recent block_hashes, but additional arbitrary data, basically in the leaves we want: (BlockHash, ExtraDataForThatBlock), where this additional data is configurable by runtime developer. In case of Polkadot Relay chain this would most likely be: (BlockHash, Vec<(ParaId, ParaHeadData)>, so that parachains are able to push some data there as well (as a mean to communicate outside of Polkadot with bridged chains).

Rough pseudo-code:

trait MMR {
 type LeafData: Encode;
 const IS_ENABLED: bool;
 fn mmr_leaf_data_for_current_block() -> Self::LeafData;
}

struct ParachainMmr;
impl MMR for ParachainMmr {
  type LeafData = (ParachainId, ParachainHeadData);
  const IS_ENABLED = true;
  fn mmr_leaf_data_for_current_block() -> Self::LeafData { todo!() }
}

impl frame_system::Trait for PolkadotRuntime {
  type MMR = ParachainMmr;
}

Ideally the hash function should be configurable to make MMR proofs easy to verify on chains with limited support (Think: Keccak256 or Blake2).

  1. The MMR peaks of current block should be easily queryable (RuntimeApi), then they can be retrieved by Grandpa Validators, who would collect signatures for last finalized MMR root hash (see proposed algorithm of choosing a block below). As a result we would get:
    (MmrRootHashOfSomeFinalizedBlock; Signature(s)FromCurrentValidatorSet)

The crypto for signature is not yet decided yet, ideally if this was something that is cheap to verify on Ethereum and supports signature aggregation. Perhaps BLS over some eth-supported curve.

Open question remains how frequently such commitments should be prepared and how validators should agree on a single block to collect the signatures on. One option would be to take a block:

block_to_sign_on = last_block_with_signed_mmr_root + NextPowerOfTwo((last_finalized_block - last_block_with_signed_mmr_root) / 2)
  1. These signed commitments from (2) can then be used to:
  • Feed back on Substrate chain (easily queryable finality proofs available on-chain)
  • Create an efficient Grandpa Light Client to be used for bridges.

Potentially useful:
https://github.com/darwinia-network/merkle-mountain-range
https://github.com/darwinia-network/darwinia-common/blob/master/frame/header-mmr/src/lib.rs

@tomusdrw
Copy link
Contributor Author

tomusdrw commented Dec 9, 2020

Still todo in substrate repo:

  • Runtime API and Custom RPC to generate leaf proofs.
  • Pruning of non-peak nodes from on-chain storage
  • Customization of MMR growth cadence (currently we do it on every block, for some cases it might be enough to grow once per session).

@boundless-forest
Copy link
Contributor

Hi, @tomusdrw I have noticed the Merkle Mountain Range pallet(paritytech/substrate#7312) is accepted in the substrate. Based upon this, Is it possible to integrate MMR to sub-sub bridge in parity-bridges-common design to avoid relaying every block from source chain to target chain?

@tomusdrw
Copy link
Contributor Author

@AsceticBear hey! Yes, we have a plan to create a different Sub<>Sub bridge pallet and use MMR + BEEFY actually instead of GRANDPA.
It's also possible to do some optimizations with MMR + GRANDPA, for instance you can import a justification and MMR Proof that shows that the justification has ancestry that is the latest known finalized block of the pallet (i.e. you skip importing headers completely).
That said both of these things are not being actively worked on yet, we hope to start work on this kind of optimizations in Q1 of 2021.

@boundless-forest
Copy link
Contributor

@tomusdrw We have a plan to do this MMR optimization based on current sub<>sub bridge in this repo like what you said above. However, a problem we encountered is how do we know the block with the flag signal_hash without relaying every header from source chain to target chain. In another word, In the target substrate client, we maintain the source chain authority set of GRANDPA to verify justification. When the relayer transfer block continuously one by one, the target chain client can get the authority set change timely. But, how do we achieve this while the relayer skips importing the header from source chain to target chain one by one?

@tomusdrw
Copy link
Contributor Author

tomusdrw commented Jan 4, 2021

@AsceticBear sorry for the late answer. Please take a look at BEEFY protocol:
https://github.com/paritytech/grandpa-bridge-gadget
It is designed specifically to work with MMR to make it easy to implement such low bandwidth bridge.

Currently it's using secp256k1 signatures for Ethereum compatibility, but the plan is to have BEEFY authorities create an aggregated signature (e.g. BLS) for even faster verification.

If I understood your question correctly, with GRANDPA + MMR your client might need to have the session length hardcoded, so that with every set transition it requires the header with signal_hash and a proof that it's part of current MMR. Obviously the block you import should be signed by the old authority set to enact the transition.

@tomusdrw
Copy link
Contributor Author

Closing, since:

  1. MMR implementation has landed to substrate & polkadot
  2. BEEFY is being worked on in https://github.com/paritytech/grandpa-bridge-gadget
  3. We now have pallet-bridge-grandpa (Finality Verifier Pallet #628) and we don't really verify ancestry cause it does not add any value in our case.

svyatonik pushed a commit that referenced this issue Jul 17, 2023
* Upgdate to latest polkadot & substrate

* Fix code formatting (cargo fmt)

* Fix unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-feat New feature or request P-Runtime
Projects
None yet
Development

No branches or pull requests

3 participants