Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of domain block body and extrinsics root #2181

Closed
NingLin-P opened this issue Oct 30, 2023 · 6 comments · Fixed by #2343
Closed

Inconsistency of domain block body and extrinsics root #2181

NingLin-P opened this issue Oct 30, 2023 · 6 comments · Fixed by #2343
Assignees
Labels
bug Something isn't working execution Subspace execution gemini-3h

Comments

@NingLin-P
Copy link
Member

When constructing the domain block, we are using a customized block builder, so that we can build the domain block with the exact extrinsic that derive from the domain bundle. While most of the behaviors of the customized block builder are the same as the upstream one, there is a slight deviation:

In the upstream block builder, only extrinsic that executed successfully will be stored in the block body:
https://github.com/paritytech/polkadot-sdk/blob/2d9426f1cc144a0624ea0329ddc7e567bb47d6b2/substrate/client/block-builder/src/lib.rs#L214-L217

In our customized block builder, all the extrinsic that derive from the domain bundle will be stored in the block body:
https://github.com/subspace/subspace/blob/cf22036f78dd6418f3c71184ffd516dd08e26d32/domains/client/block-builder/src/lib.rs#L288

On both the upstream and our block builder, if an extrinsic return error while executing it, all of the intermediate state generated during the execution will be rollbacked, which includes:

  • State generated while executing the extrinsic itself
  • The pallet-system related state, including ExtrinsicData which is used to compute the extrinsics_root in the block header
    • So the failed-to-executed extrinsic won't be included in the block header's extrinsics_root
  • The execution trace that used to construct the ER
    • So the failed-to-executed extrinsic's trace won't be present in the ER::execution_trace

This causes an inconsistency of the domain block body and extrinsic root, if we use the domain block body to compute the extrinsic root again, we will get a different extrinsic root compared to the one in the block header. Furthermore, this also causes trouble in the fraud proof:

  • The invalid state transition fraud proof, we are using the first mismatch_index of the local_er::execution_trace and external_er::execution_trace to get the extrinsic present at mismatch_index in the domain block body as the call data, and used it to verify the execution proof, since the ER::execution_trace only store trace for successfully executed extrinsic while the domain block body store all the extrinsic, we could getting the wrong extrinsic as the call data
  • The invalid domain extrinsic root fraud proof, during the verification of this proof, it uses all the domain extrinsic derived from the domain bundles to compute the expected extrinsic root, which is wrong because the valid ER's extrinsic root only contains successfully executed extrinsic

If we follow the upstream behavior to only store the successfully executed extrinsic in the block body, the invalid state transition fraud proof issue can be fixed easily. However, the invalid domain extrinsic root fraud proof is difficult to fix, because it needs to use the successfully executed extrinsic to compute the extrinsic root, and it is hard to know whether an extrinsic can be successfully executed without actually executing it (and all other extrinsic in front of it).

Another potential solution is we still keep all the domain extrinsic in the block body, and when an extrinsic returns an error while executing it in our block builder, we only rollback the "state generated while executing the extrinsic itself" and keep the pallet-system related state and the execution trace state, this will require further customization for our block builder and will bring more complexity.

cc @vedhavyas @ParthDesai @nazar-pc

@NingLin-P NingLin-P added bug Something isn't working execution Subspace execution labels Oct 30, 2023
@nazar-pc
Copy link
Member

My first thought is to combine proposed solutions:

  • extrinsic root works as upstream: only includes successfully executed extrinsics to not break any invariants there
  • execution trace includes all extrinsics, but for those that failed "before" and "after" state will be the same since they essentially did nothing (the rollback mentioned above), which you can also trivially prove

@NingLin-P
Copy link
Member Author

NingLin-P commented Oct 30, 2023

extrinsic root works as upstream: only includes successfully executed extrinsics to not break any invariants there

In this way, the invalid domain extrinsic root fraud proof issue still exists:

because it needs to use the successfully executed extrinsic to compute the extrinsic root, and it is hard to know whether an extrinsic can be successfully executed without actually executing it (and all other extrinsic in front of it)

@NingLin-P
Copy link
Member Author

As @vedhavyas previously mentioned failed extrinsic will still increase the signer's nonce (thus not fully rollback of the state), I took a closer look and found something new.

Essentially, the extrinsic can fail to be executed due to:

  • The extrinsic itself failed during execution
  • The pre_dispatch/post_dispatch hook failed, which is mostly from SignedExtension (i.e. nonce, tx fee check), see here

https://github.com/subspace/subspace/blob/547e8837a3924941624c056623247347d7ecb2d6/domains/client/block-builder/src/lib.rs#L200-L208

In the above code, api.apply_extrinsic is returning Result<Result<DispatchResult, TransactionValidityError>, ApiError>. It will enter the first branch and commit the state change as long as the pre_dispatch/post_dispatch check is passed no matter what DispatchResult is, but if the extrinsic itself failed during execution, it should not the partial state change of the call. And it will enter the second branch and rollback all the state changes if the pre_dispatch/post_dispatch hook fails.

Thus if an extrinsic itself fails during execution it will still be included in the extrinsic root, but if it fails at pre_dispatch/post_dispatch it will not.

There are 2 general solutions for this issue:

  • Eliminate all possible extrinsic that will fail at pre_dispatch/post_dispatch from the domain block
  • Allowing extrinsic that will fail at pre_dispatch/post_dispatch exist in the domain block, and keep the pallet-system related state and the execution trace state for them.

@vedhavyas
Copy link
Contributor

Ideally, the extrinsics that fail the basic checks should not be included and bundle carrying such should be marked invalid. Shouldn't this be solved once we implement extrinsinc check of extrinsincs in a bundle using single execution context ?

@NingLin-P
Copy link
Member Author

Ideally, the extrinsics that fail the basic checks should not be included and bundle carrying such should be marked invalid. Shouldn't this be solved once we implement extrinsinc check of extrinsincs in a bundle using single execution context ?

Not solved completely, because the pre_dispatch/post_dispatch is performed against the extrinsic of the whole domain block that comes from multiple bundles:

  • Consider there are 2 bundles both containing an extrinsic signed by the same account with the same nonce but different content, while the account can only afford one of the extrinsic. Both bundles can pass the check and both extrinsic can be included in the domain block (deduplication doesn't work since they have different content) while one of them will fail to pass the pre_dispatch/post_dispatch due to the nonce.
  • There are checks like CheckWeight that can also fail, when there are too many extrinsic ends up in the same domain block

Overall, I think the "Eliminate all possible extrinsic that will fail at pre_dispatch/post_dispatch from the domain block" approach is more strict and hard to hold all the time since the abovementioned case may not be exhaustive.

@ParthDesai
Copy link
Contributor

ParthDesai commented Nov 10, 2023

Hmm. AFAIK, post_dispatch should not return an error unless it is an inherent.

I am leaning towards this solution:

Allowing extrinsic that will fail at pre_dispatch/post_dispatch exist in the domain block, and keep the pallet-system related state and the execution trace state for them.

As this mean we do have enough data to re-calculate the extrinsic root correctly as well verification of fraud proof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working execution Subspace execution gemini-3h
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants