Skip to content
This repository has been archived by the owner on Jan 11, 2024. It is now read-only.

FM-189: IPC BottomUp message #187

Merged
merged 4 commits into from
Aug 23, 2023
Merged

FM-189: IPC BottomUp message #187

merged 4 commits into from
Aug 23, 2023

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Aug 14, 2023

Closes consensus-shipyard/ipc#315

Creates just the message definition for relayed bottom-up multisig checkpoints:

  • ChainMessage::Ipc for IPC related messages
  • IpcMessage::BottomUpResolve for relayed checkpoints. The checkpoint has a quorum certificate (multi-sig) from the subnet, and a signature from the relayer.
  • IpcMessage::BottomUpExec for resolved checkpoints ready for execution. This is where ABCI++ will come in; the checkpoint is stripped from the relayer and this transaction is proposed by the parent validators directly.

@aakoshh aakoshh marked this pull request as ready for review August 15, 2023 10:41
@aakoshh aakoshh mentioned this pull request Aug 15, 2023
@aakoshh aakoshh changed the title FM-173: IPC BottomUp message FM-189: IPC BottomUp message Aug 15, 2023
Copy link
Contributor

@adlrocha adlrocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I feel we may need a special case for the bottom-up interaction from L2 -> root. Filecoin mainnet won't be running Fendermint, but Lotus, so the only way to submit information on-chain is through an eth_sendTransaction. We could delegate some of the logic for message relaying and resolution to the IPC agent, but we may be over-complicating it when the subnet validators could make the multisig for the checkpoint available for relayers to submit as a transaction in the parent.

Let me know if this makes sense, we can discuss sync if it is easier.

fendermint/testing/src/arb/subnetid.rs Outdated Show resolved Hide resolved
ChainMessage::ForExecution(_) | ChainMessage::ForResolution(_) => {
// Users cannot send these messages, only validators can propose them in blocks.
ChainMessage::Ipc(IpcMessage::BottomUpResolve(_msg)) => {
// TODO: Check the relayer signature and nonce. Don't have to check the quorum certificate, if it's invalid, make the relayer pay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO confused me a bit. Are we expecting to have the figure of a relayer in our initial implementation?

How are BottomUpResolve messages arriving to our peer?

Copy link
Contributor

@adlrocha adlrocha Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, gotcha. We are calling a relayer the entity submitting the bottom-up checkpoint in the parent. It is clear from the comments in the IPC message definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I thought it's just a modified version of the IPC Agent: where it currently has to sign a checkpoint and then send the transaction to the parent, now it has to listen to an event emitted from a smart contract on the child, accumulating the signatures without the help of the agent, and then take that checkpoint, wrap it in a special relay transaction (for CID visibility) and send it to the parent.

Ostensibly we could send it as an FVM transaction and inspect the to, and parse it, instead of having a special IPC transaction 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take that checkpoint, wrap it in a special relay transaction (for CID visibility) and send it to the parent.

Let's discuss this in today's sync, I am having a hard time understanding the end-to-end flow you are imagining of how the checkpoint signatures are collected, and the actual submission in the parent.

/// The address (public key) of the relayer in the current subnet.
pub relayer: Address,
/// The nonce of the relayer in the current subnet.
pub sequence: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sequence is the increasing nonce for the address in the network, right? Meaning, if the user has sent some raw subnet transaction with nonce n, and then they want to send a RelayedMessage, the sequence of the relayed message should be n+1,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just the sequence of the actor doing the relaying. I am not happy that I had to copy all these fields (and the gas fields in the next PR), so if you think this should be a regular FVM message that we know how to parse to get the CID out, I am open for that too.

/// a block height (for sequencing), any potential handover to the next validator set, and a pointer
/// to the messages that need to be resolved and executed by the parent validators.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct BottomUpCheckpoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a few more fields as part of the BottomUpCheckpoint: https://github.com/consensus-shipyard/ipc-solidity-actors/blob/cc109a57d62755cf849028a366258c38c04e49c8/src/structs/Checkpoint.sol#L8. Mainly:

  • fee: Fees to be distributed in the parent (this may change for Fendermint).
  • prevHash: The hash of the previous checkpoint to build a chain of checkpoints that can be persisted and discoverable externally (IPFS/FIL sector).
  • proof --> snapshot: We should rename it to snapshot, but whatever state snapshot or proof of the state a subnet wants to propagate to its parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for commenting on this, it is something I needed your input on.

I see that what you are pointing at is in Solidity, so I was wondering if we can put that behind the CID, and keep only the fields here that we need in order to validate in the SubnetActor that this checkpoint meta-data is correct, and it can be resolved into the full thing.

For example the fields you listed:

  • fee: I assume each transaction pays its own fee in our chosen default execution model, where everyone pays for their own stuff. Is this fee a general fee to pay for the checkpoint itself? See if we can leave it behind the CID. I thought that we can "execute" this meta-data message by sending it to the SubnetActor for i) validation and ii) taking payment from the relayer; if fees has to be sent from the common pot to some parent validators, indeed this would be a good time to do it, and for that it should be visible in this record. Could it not live in the SubnetActor instead, so that it automatically does it, instead of the subnet validators sending a different value each time?
  • prevHash: Do we really need this to be in the meta-data for validation? We have a height and if it's signed by the right validator set at the right height, should we not accept that this is what the committee is saying the next checkpoint is? I'm not saying we don't need it, but can it be behind the CID?
  • snapshot: This sounds like it can be behind the CID, that it's just like cross-messages, but different. One ramification of lifting it into the meta-data is that I built the resolver in FM-191: Execute BottomUpResolve #201 by assuming a single CID to be resolved, not a list - I assumed we can always create a struct to hold multiple CIDs, which makes it easier to track its completion. We can still know about it, and treat it in a special way during the execution of the checkpoint, when this is already available, otherwise we would just stuff it in the blockstore without even charging for it.

I am also wary about the fact that the checkpoint is in Solidity but we use the IPLD resolver to grab it, which means if was part of the EVM data structures then it's only possible to interpret it by the EVM actor, so it would have to be shoveled over and stored in the BlockStore as IPLD, not as parts of the EVM KVstore, to make it accessible to the IPLD resolver. This would be trivial if we used built-in actors, but with EVM actors it's not.


/// Relayed messages are signed by the relayer, so we can rightfully charge them message inclusion costs.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct SignedRelayedMessage<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we expect this SignedRelayedMessage to be triggered by users? I think we may have an issue here, as any bottom-up protocol we come up with would have to be compatible with Filecoin mainnet, i.e. Lotus.

How I was imagining this to work, is that users directly send the CertifiedMessage to the parent inside an eth_sendTransaction, and the verification is performed on-chain in the subnet actor, which means that all the information required for the verification should be already included in the message. This means that the child subnets validators would have to tailor the CertifiedMessage in their subnet before it can be propagated up (maybe this is what you are thinking of doing and I am missing it from the type definitions).

For L2+ we can use this approach of relay+execution, but in L2->L1 bottom up interactions we may just be able to send a raw transaction against the ETH RPC to submit the checkpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we expect this SignedRelayedMessage to be triggered by users?

It would be sent by the IPC Actor iff it knows the parent is a Fendermint node as well, because these messages don't work for Lotus at all. But with Lotus you also can't use the IPLD Resolver, so it's completely different.

This is unfortunate, and I agree this whole thing with IPLD Resolution and CIDs is a bit forced in order to make use of IPLD in the first place. It's possible that if message size is a problem we should just have gone with a Merkle root in the checkpoint and Merkle proof in relayed messages, simples, at the cost of multiple relayers including the same transaction payload multiple times.

So, we have the well-known ID of the gateway actor where these messages are going, so in theory Fendermint could inspect all FVM messages going to that actor and try to deserialize them as the EVM calldata payload that should contain an ethers generated DTO. It would have to be a different one from the BottomUpCheckpoint in Checkpoint.sol you pointed at, because that one already has fully messages inside; if we want to use CIDs, that is. So it could be the equivalent of a CertifiedMessage, ready to be validated, with an opaque CID field to be resolved by the time of execution, at which point it can be turned into the Solidity BottomUpCheckpoint.

That way we can drop this SignedRelayedMessage. It's existence is to make it clear that we are separating the act of relaying from the act of execution, which forces us to deal with the problem of who should pay for which part - the relayer pays for the validation, the inner message is paid for by its own sender(s). It should be possible to turn it around and lift the FVM message handling into Fendermint by parsing as well.

In any case, Lotus would have a different message format there which includes payloads.

/// The parent subnet already expects the last validator set to sign this one.
pub next_validator_set_id: u64,
/// Pointer at all the bottom-up messages included in this checkpoint.
pub bottom_up_messages: Cid, // TODO: Use TCid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to include the list of messages here and not just the cid for L2->root interactions as described above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data structure doesn't have to be the be-all, end-all data structure. It is for bottom-up messages arriving into Fendermint, and that is all we care about. How it is managed with Lotus is the concern of those relayers which send this to Lotus. For example they can do the equivalent of this:

pub struct RelayedCheckpoint {
  pub checkpoint: CertifiedMessage<BottomUpCheckpoint>,
  /// The CID of these messages have to match the CID in the checkpoint.
  pub bottom_up_messages: Vec<CrossMsg>
}

And then part of validation is to check that.

I know it sucks that you have a single set of Solidity actors and two different ideas for relaying the stuff 😞

#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct SignedRelayedMessage<T> {
/// The relayed message with the relayer identity.
pub message: RelayedMessage<T>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I hate that Filecoin uses message for transaction 🤦, it is so confusing sometimes. Here we are sending an arbitrary message, but then in the bottom_up_messages of the checkpoint we have the Cid of the bottom-up transactions being propagated, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, this is any arbitrary message that can be relayed, but in the point of inclusion in IpcMessage we know what to expect actually, so it's generic in order to be able to potentially reuse it and to express the nature of relaying stuff. The bottom_up_messages are pointing at a CID that we have to retrieve and execute in a different context, not paid for by the relayer. We can change it point at a CID of a different thing which has a snapshot in it as well.

The goal there was to be able to process the message in stages:

  1. Relayer sends something: we should be able to validate the thing before resolving it, so as not to start pulling unlimited data, which we do by at least validating the checkpoint meta-data against the subnet committee. The relayer pays for this validation work, and will be rewarded either now or later, when we execute.
  2. We resolve the content in the background.
  3. We execute the meta-data and the content together (each paid by someone else); we can reward the relayers (all of them) now.

This would not be possible if we pass all messages as a single FVM message to Solidity.

@aakoshh
Copy link
Contributor Author

aakoshh commented Aug 22, 2023

but we may be over-complicating it when the subnet validators could make the multisig for the checkpoint available for relayers to submit as a transaction in the parent.

@adlrocha I don't fully understand this comment. In the data type I am proposing in this PR, this is exactly how it's working. I understand that with Lotus the actual ChainMessage type would not exist, but the signed checkpoint (the CertifiedMessage<BottomUpCheckpoint>) could still be used and relayed, except it has to be wrapped in a format that Lotus can consume, which could involve attaching the full payload, and checking it against the CID.

To be honest the interaction with the Solidity actors is what I am most uncomfortable with, and wanted to discuss with you what it unfolds into.

Co-authored-by: adlrocha <6717133+adlrocha@users.noreply.github.com>
Copy link
Contributor

@adlrocha adlrocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing sync, we agree on what needs to be improved and what our constraints are. Merge time :)

@aakoshh
Copy link
Contributor Author

aakoshh commented Aug 23, 2023

For the record, what we discussed about bottom-up checkpointing is that we basically have two approaches:

  • Lotus: Doesn't support IPLD resolution so we cannot use CIDs, we have to send fully fledged messages. Because we are having a single outbox per subnet, rather than the channel-based solution in IBC, we send the full batch to Lotus and hope that we'll not run into size limits. Because we have no special support for Lotus, the entire checkpoint execution, including all the messages, will have to be covered by the gas of the relayer who sends the checkpoint. They can get reimbursed from the governance account. If we had access to the gas_used SDK method from Solidity then we could even reimburse the relayer during message execution from the original sender of the cross message, based on gas deltas, but apparently we can't use this method so the Gateway actor doing the execution cannot figure out how much is owed after each message.
  • Fendermint: Supports IPLD resolution so we can use CIDs and reduce the checkpoint to just meta-data, delegating cross-message execution to the Fendermint host,where we can charge the message sender instead of the relayer. We only talk to the Gateway actor for checkpoint validation, not execution.

Alfonso is okay with adding 2 different methods to the Gateway for the above cases, and we'll have a switch in Fendermint to tell which kind of checkpoint to construct: a fat one with messages included, or a thin one with just a CID, depending on what kind of parent it has.

@aakoshh aakoshh merged commit 285e644 into main Aug 23, 2023
6 checks passed
@aakoshh aakoshh deleted the fm-173-bottom-up-ckpt-msg branch August 23, 2023 13:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define bottom-up checkpoint message type
2 participants