Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce submit_receipt to fill up the unbounded gap between HeadDomainNumber and HeadReceiptNumber #2933

Merged
merged 10 commits into from
Jul 24, 2024

Conversation

NingLin-P
Copy link
Member

@NingLin-P NingLin-P commented Jul 19, 2024

close #1673

The workflow of domain block production in main:

  1. The operator collects tx from tx pool and verifies the tx against its last block, and submits the tx with the ER at HeadReceiptNumber + 1 (in its local view)
  2. The consensus runtime verifies the bundle and only accepts it if its ER is HeadReceiptNumber + 1
  3. All the domain nodes validate the bundle included by the consensus block and mark the bundle as invalid if any tx fails to pass the verification against the last domain block

In step 1, the operator performs the check against its local view of the last block and the ER at HeadReceiptNumber + 1, if the operator is lagging its last block is an older block and it may include invalid tx (at a global view) into its bundle, this is inevitable in a distributed system. But as long as HeadDomainNumber - HeadReceiptNumber == 1, the consensus runtime only accepts the bundle if it carries an ER at HeadDomainNumber (i.e. the last domain block), and the stale bundle produced by the lagging operator will be rejected safely. This safeguard is broken if there is a receipt gap and HeadReceiptNumber + 1 < HeadDomainNumber, so the consensus runtime will accept the stale bundle even if it carries an ER derived from an older block.

The receipt gap can be brought by:

  • FP which revert the HeadReceiptNumber
  • Domain runtime upgrade which increase HeadDomainNumber
  • An edge case about the genesis receipt which is fixed in 0de8728

This PR fixes the issue by:

  • Reject any bundle if there is a receipt gap where HeadDomainNumber - HeadReceiptNumber >= 1
  • Introduce submit_receipt that is used to submit receipts to fill up the gap
    • submit_receipt is similar to submit_bundle that also requires proof_of_election and performs the similar check
    • submit_receipt doesn't contain any tx and thus doesn't derive any new domain block
    • submit_receipt is only allowed when there is indeed a receipt gap
  • Explicitly check the bundle tx against ER::domain_block_hash in step 1 and check the bundle carries the ER of the parent domain block in step 3
  • (Non-related to the isse) Use the head domain number to trigger epoch transition instead of the confirmed domain block

The security model:

  1. The operator when producing the bundle, commits the validity of all the tx to the ER that is submitted together
  2. The consensus chain only accepts the bundle if it can ensure the ER is derived from the parent domain block
  3. The domain node re-validates all tx in the bundle against the parent domain block and builds the next block

Code contributor checklist:

This is for better code reusability in the upcoming commit

Signed-off-by: linning <linningde25@gmail.com>
It is used to submit receipt to fill up the gap between HeadDomainNumber and
HeadReceiptNumber, it consist of the receipt, proof_of_election, and the signature.

Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
…s receipt gap

Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
@NingLin-P NingLin-P added the need to audit This change needs to be audited label Jul 19, 2024
@NingLin-P NingLin-P requested a review from dariolina July 19, 2024 15:24
@dariolina
Copy link
Member

Use the head domain number to trigger epoch transition instead of the confirmed domain block

IIRC there was some specific reason we had epoch transition triggered by confirmed block instead of latest? I feel like we have already changed this behavior. Anyway let's not introduce the canges unrelated to the bug in this PR.

@NingLin-P
Copy link
Member Author

IIRC there was some specific reason we had epoch transition triggered by confirmed block instead of latest? I feel like we have already changed this behavior. Anyway let's not introduce the canges unrelated to the bug in this PR.

Okay, will pick the change to a separated PR.

Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
Copy link
Contributor

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall make sense.
Question on migration if required or not

///
/// Temporary storage only exist during block execution
#[pallet::storage]
pub(super) type HeadReceiptExtended<T: Config> =
StorageMap<_, Identity, DomainId, bool, ValueQuery>;
pub(super) type NewAddedHeadReceipt<T: Config> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this change require a migration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary because it is a temporary storage that only exists during block execution and is cleared at block finalization.

@NingLin-P NingLin-P added this pull request to the merge queue Jul 24, 2024
Merged via the queue into main with commit 87a5ff3 Jul 24, 2024
9 checks passed
@NingLin-P NingLin-P deleted the fix-er-gap branch July 24, 2024 15:40
@dariolina
Copy link
Member

@NingLin-P can you please add this new behavior to the specs ? Here in bundle verification https://github.com/autonomys/protocol-specs/blob/7a1f861dedf6cf458632394285975270a2cdbee2/docs/decex/workflow.md?plain=1#L152
and submit_receipt call here https://github.com/autonomys/protocol-specs/blob/7a1f861dedf6cf458632394285975270a2cdbee2/docs/decex/interfaces.md#runtime-calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need to audit This change needs to be audited
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The unbounded gap between head receipt and the best domain block
3 participants