-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-msgs: Replace implicit execution with multisig protocol #453
Comments
To rephrase this, what we are saying is that the validators in the child subnet If a validator doesn't recognise The signatures will have to be periodically re-published until the agent sees that the message has been included in the child subnet. That's because with Gossipsub, the agent can never be sure who got their signature, and at least in theory it is possible that other agents weren't connected at the time, and a transaction doesn't gather enough signatures at the right time, or previous signatures are forgotten due to restarts. Also because of the previous point - an agent might simply not recognise a CID at the time it gets it. A more transparent and persistent solution would be to send the individual signatures supporting a top-down message to the ledger. That way an agent could always see that theirs is missing, and add it. |
I would suggest that the top-down messages are also bundled, like the checkpoints, something like: |
Regarding bundling messages and as a solution to the multisig protocol, perhaps the topdown messages that are locally seen by the IPC agent at the parent as valid could be sent directly to a Trantor's mempool abstraction at the child that receives requests from Lotus but also from the IPC agent (not sure if such transaction already exists in Spacenet). The availability module of Trantor already tries to get a multisig for the block anyways, so it should be easy to implement with the mempool abstraction. |
@ranchalp that sounds similar to how I want to use Tendermint's voting mechanism to agree on when a top-down message can be included in a block, but better because you don't have to put it on the critical path. Can you modify the availability voting to only cast a vote when the message is observed as final, not just that it's available? |
@aakoshh That's what I meant by the mempool abstraction. The mempool should provide those messages only when seen as final at the parent (or even the IPC agent should not provide them to the mempool until they're final), and for verification it should be easy to abstract a trantor event to call the mempool (or IPC directly from the availability module). Happy to get on this, but would need a bit of onboarding (or just a pointer to some tutorial/PR/relevant code) to understand the current functioning of topdown msgs in the code. |
This would be great, the problem I personally have with this is deciding the logic to bundle the messages together. In bottom-up messages is clear because the checkpoint does the cutoff, but in top-down messages there is no clear cutoff. |
@ranchalp it sounds like the mempool in this case doesn't do any gossiping, unlike the Lotus mempool for example, right? It's okay to provide the transaction to the mempool when it's final, or for the mempool not to recommend it for inclusion in a block until it's final; we just have to make sure that if an adversarial validator cannot do anything with it either until it's final. If the availability voting can be made to work that way, great! @adlrocha I would have thought bundling can happen on a 1 bundle per block basis. Probably not going to achieve much reduction on the number of messages, but still. |
No, the Trantor (current) simplemempool does not do any gossiping, which iiuc makes it ideal for this. And yes, the existing availability module with verification on a local check of finality should suffice. |
This is the beauty of using Trantor's availability module, batching will happen out of the box |
Update for the execution of top-down messages in M2
To avoid the need of implicit execution or ad-hoc consensus checks, the idea is for IPC agents of validators to submit periodically to the child gateway a "top-down checkpoint" (which we can call e.g. ImplementationThe implementation has two parts: Actors
pub CronCheckpoint struct {
pub epoch: ChainEpoch
pub membership: MembershipSet
pub top_down_msgs: Vec<StorableMsg>
} Cross-message orchestratorThe orchestrator process running in the IPC agent that tracks the state of the parent and submits cron checkpoints.
|
If I understand it correctly, it's basically on-chain voting of the child validators for the last observed parent state and top-down transactions to execute. This is exactly what we currently use for reconfiguration. I think it should work, just make sure you catch all the corner cases when the votes arrive at the child in an arbitrary order (especially "older" cron checkpoints after a "newer" one). Also note that this can probably be implemented in the integration code (i.e. "between" Eudico and Mir, like the reconfiguration), in case that happens to be easier to do. You can look at @dnkolegov's implementation of child reconfiguration if you choose that path. |
This shouldn't be a problem, all votes are cached and are not garbage collected until they are executed, and top-down messages need to be executed in order, which means that the order of the cron checkpoints is also kept.
Unfortunately, this is not the case, as for this to work we would require the use of implicit execution (back to the original problem). |
Background
In the MVP of IPC over Eudico, the execution of cross-net messages was performed through implicit execution. The reason for requiring the implicit execution of cross-net messages is that in order to authenticate a cross-net message being proposed in a block, validators (and full-nodes) had to verify that the message was final in the source subnet. Implicit execution as a way to introduce an explicit check for specific messages before being accepted in a block, and conveniently executed through the gateway. This is what we used to verify the validity before being included in a block and executed.
The role of this step in the execution of messages is for all peers to agree on the finality and correctness of the messages to be executed. By including this checks as part of the consensus we remove the need of a succinct proof from validators that can be verified on-chain by actors.
Unfortunately, this design is consensus breaking, so this change would require an upgrade to support cross-net message execution in Filecoin (and other networks). This approach also introduces several issues with the new integration of Lotus with Mir, as even after a cross-net messages has been ordered by Mir, it may end up not being valid after the consensus --perfomed after ordering--.
Initially, we were planning to ship M2 with implicit execution and then implement an alternative protocol for the execution of cross-net messages, but may not be possible anymore.
Proposal
To remove the need of implicit execution, we will rely on a multisig protocol among the validators of the destination subnet to verify the validity of cross-net messages.
Top-down messages
All IPC agents from validators in a subnet will be subscribed to a gossipsub topic
/ipc/cross-net/<subnet-id>
. This can be seen as a new mempool for top-down cross-net messages for each subnet. This mempool is used to propagate information about new unverified top-down cross-net messages in the subnet.For every unverified message seen with destination the
subnet-id
, the IPC agent for a validator will publish a message in the aforementioned gossipsub topic with their signature over the CID of the message. Validators only perform and broadcast this signature if they consider the message valid for execution in the subnet (i.e. it is valid and final in its originator subnet).Validators in a subnet listen to this updates from other validators in the destination subnet, and collect the signatures of others. When a validator has collected the minimum number of signatures required for the execution of the cross-net message (initially 2/3 although we can make it subnet configurable), it sends a transaction to the
apply_mesage
of the gateway actor including the multisig and the message to be executed.The gateway will check the multisig from all validators to authenticate the validity of the message removing the need for an off-chain authentication in the consensus stage through the implicit execution of messages.
The operation of
apply_message
stays the same, i.e. it expects cross-net message to be executed sequentially by increasing nonce without gaps. The only change introduced by the actor is this preliminary check, and the fact taht any validator can post a message for execution.Edit with @aakoshh's useful clarification:
Bottom-up messages
apply_message
with all the batch of messages from the checkpoint. The gateway will check that these messages actually correspond to the cid propagated in the checkpoint in which case they can be immediately executed.Thus, validators see a new checkpoint with
cross_msgs: cid(Vec<Msgs>)
. They resolve the message through the IPLD relsolver, and send a transaction toapply_message
withVec<Msgs>
as argument. The gateway computesCID(Vec<Msgs>)
and compares it against the one in the checkpoint to see if they are the same. If they are all messages in the vector are correspondingly executed.Implementation
The implementation of the protocol involves the following:
For the multisig protocol we can use a list of signatures for simplicitly initially, but in Filecoin we already have support for BLS signatures, so we can leverage them to submit more succint proofs where the multisig is an aggregation of the BLS signatures of the validators.
Related
Related issue in the IPC agent: https://github.com/consensus-shipyard/ipc-agent/issues/39 .We can either re-use the same topic to brodcast signed proposals, or use an independent topic.
While exploring this solution I came across ERC4337. Our problem seems a bit narrower than the one tacked by the ERC but we end up with a similar solution.
The text was updated successfully, but these errors were encountered: