Skip to content
This repository has been archived by the owner on Jan 11, 2024. It is now read-only.

FM-184: Proposal interpreter #185

Merged
merged 7 commits into from
Aug 22, 2023
Merged

FM-184: Proposal interpreter #185

merged 7 commits into from
Aug 22, 2023

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Aug 11, 2023

Closes consensus-shipyard/ipc#337

The PR introduces a ProposalInterpreter similar to the ones which already exist, so that we can keep the app.rs free of much of our application logic, and farm it out into the layers where the different types and representations of messages are handled.

Currently it does nothing new, accepts everything. The goal was to introduce this necessary boilerplate and agree on the abstractions.

My motivation was to help #181 focus on the top-down message without having to decode messages in the application root.

@aakoshh
Copy link
Contributor Author

aakoshh commented Aug 11, 2023

Notes

https://github.com/tendermint/tendermint/blob/v0.37.0-rc2/spec/abci/abci++_app_requirements.md say that ProcessProposal needs to be deterministic and depend only on the state, which seem at odds with what we are trying to do with it:

  • only vote on block if the parent state is final according to the view of individual validators
  • only vote on block if the checkpoint CID is available to the validator casting the vote

The formal requirements state that the vote should only depend on the block and the previous state, which is obviously not the case for us. Obviously we are aware of this, which is why we discussed that timing is critical, and we should consider:

  • using the vote gossiping feature of the IPLD Resolver to only propose top-down checkpoints after a quorum of votes have been collected, at which point we can say that we don't even need the ProcessProposal mechanism, but at least we should be able to use it because the honest validators will vote for the block if they have voted for the CID (although we can consider that if +2/3 voted for the CID, at least +1/3 should vote for the block, but 1/3 might be Byzantine and vote against it, while the rest of the 1/3 have not seen it as final)
  • always use a for resolution transaction transaction first to make sure that every validator is aware of what they should try to resolve - technically it is in the state then, even though it's availability isn't. We could add their votes to the state too, but the data would still not be in the ledger, only pledged to be available. Again we'd only know that +1/3 of the nodes should be willing to serve the data. To preserve liveness we can include the gossiped Quorum Certificate.

So while it's not deterministic, we have good mitigation strategies to minimise the chance of stalling consensus.


let txs = self
.interpreter
.prepare((), txs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass txs as a parameter to prepare? I had the impression that we are creating extra transactions but not modifying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not be modifying them now, but the interface itself doesn't need to know that, and prepare is about doing whatever you like with the list of transactions in the mempool, so passing it along seems to make sense.

I have not implemented any of the following but passing the transactions could be useful for:

  • Select the most profitable transactions to include if we are over some block gas limit
  • Make sure there is a partial ordering of transactions by account and sequence number, although it's likely that the CometBFT mempool took care to maintain the order and this is a non-issue
  • Replace transactions with CIDs (could be interesting if the same transactions were proposed over and over)

So you are right, at the moment all we do is add more, but not passing along would break this abstraction IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, gas limit as well, Tendermint has no idea about it.

https://github.com/consensus-shipyard/fendermint/issues/208

fendermint/vm/interpreter/src/bytes.rs Show resolved Hide resolved
// This would indicate a Byzantine validator which includes rubbish in their proposal.
// We could reject the proposal here, or we can accept it and punish the validator during
// block execution, so that their power is reduced.
tracing::debug!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reject the proposal here is we cannot deserialize the txn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising that issue. I like said in the comment above, it depends on how we want to handle the situation:

  • We could say that maybe ours or their implementation is buggy and we would rather not vote - if the block still gets elected that means the majority could deserialize it, unlike us.
  • We could also say that we can penalize the validator during delivery for including transactions we cannot handle - in that case if it's only us who couldn't deal with it due to a bug, we'll fall out of consensus by computing a different application hash.

Preventing votes by including malformed transactions would be a needlessly complicated way to attack because a malicious proposer can simply not send any proposal and induce a timeout.

I'll add a flag to give the option to reject these, and then we can decide later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the flag in a36603e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more problem with rejection is that we have to be careful not to reject something we accidentally admitted into our own mempool, otherwise it will never be removed from it.

}

/// Perform finality checks on top-down transactions and availability checks on bottom-up transactions.
async fn process(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can call it validate or something else along the line as process I feel it's consuming or processing the message, but we are actually inspecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know what you mean, I based all names in all the interfaces on what they are called in ABCI but without their prefix they are a bit confusing, like begin and end, or this process - they make sense in the context of their trait, but when the interpreter implements all interfaces, they are not very good.

Should we just use the full names, like begin_block and process_proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate would be similar to process in that it would be clashing with check, which lives in the CheckInterpreter.

Copy link
Contributor Author

@aakoshh aakoshh Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created https://github.com/consensus-shipyard/fendermint/issues/207

I think it would be good to tackle the naming separately after all the PRs are merged to reduce the rebasing churn, and to treat all of them in a uniform way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2-cents arriving late to the party: I think interpreters are a really powerful abstraction, and having them well documented can take us a long way. I agree, let's come back to naming once everything is in place so we can figure out one that is clear and general enough if this was to be ported to some other code base.

Copy link
Contributor

@adlrocha adlrocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gold. Thank you for this @aakoshh, it also helped me understand the role of interpreters better 🙏

size += tx.len();
txs.push(tx);
}
let txs = take_until_max_size(request.txs, request.max_tx_bytes.try_into().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only check here if we reach the size limit of the Tendermint block, right? Should we add also here a TODO to remember not to exhaust the gas limit of a block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gas aspect is definitely ignored at the moment, thanks for the reminder. I created an issue to track it: https://github.com/consensus-shipyard/fendermint/issues/208

The check would not be here because at this size we don't know what messages we are dealing with, but once they are parsed into ChainMessage we can look at the gas limits.

.context("failed to prepare proposal")?;

let txs = txs.into_iter().map(bytes::Bytes::from).collect();
let txs = take_until_max_size(txs, request.max_tx_bytes.try_into().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for the gas limit check or should we only check as part of process_proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that it must be part of both prepare and process otherwise you risk freaking out CometBFT by rejecting your own proposal, which is a big no-no.

Again it would not be here as this is purely based on sizes and cannot access gas. This logic is here because it is mandatory, not something application specific like the gas, which must be delegated to interpreters.

fendermint/vm/interpreter/src/bytes.rs Outdated Show resolved Hide resolved
@@ -27,7 +29,8 @@ async fn run(settings: Settings) -> anyhow::Result<()> {
);
let interpreter = SignedMessageInterpreter::new(interpreter);
let interpreter = ChainMessageInterpreter::new(interpreter);
let interpreter = BytesMessageInterpreter::new(interpreter);
let interpreter =
BytesMessageInterpreter::new(interpreter, ProposalPrepareMode::AppendOnly, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for not rejecting a malformed proposal here (if any, as maybe is just a placeholder value for now)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the considerations in the code as comments where this flag is used.

The reason I thought we might not want to reject them for now is because

  1. They should not happen as our nodes are honest.
  2. If they happen, we have a bug with a submitted transaction, and this way the error will show up in the transaction results which the RPC client will get and print, so the test will fail with an error, rather than a timeout. Whereas if we reject the proposal, CometBFT itself will most likely die as a result because our honest nodes will have proposed something that they subsequently reject themselves. And with AppendOnly we are not going to inspect the bytes before adding them to the proposal, as it was pointed out as wasted effort by @cryptoAtwill . That will have to change if we want to limit gas, though.
  3. If we admitted a malformed transaction to our mempool, inspected it (by using PassThrough instead of AppendOnly), and not included it in our proposal, it will still stay in the mempool, resulting in a memory leak in CometBFT if we keep piling up such transactions. This would be another bug that is potentially hard to figure out, apart from flooding the logs with warnings by trying to propose the same transaction every 2 seconds.

If all our nodes are not honest and some are proposing faulty transactions then this would be a stupid way to attack instead of just not proposing any block to cause a timeout. If it happened, our only chance to weed out such actors is to admit the transaction and penalize the miner. If that happened due to a bug, we'd risk consensus failure, but at least we'd see what's happening, and by coding the penalty mechanism we'd discourage anyone from intentionally doing this, and to implement these checks in check_tx.

So, overall, I thought that a malformed TX here is most likely a bug that is easiest to detect by letting it through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! Assuming our implementation honest saving ourselves from nasty bugs is the way to go, thanks!

}

/// Perform finality checks on top-down transactions and availability checks on bottom-up transactions.
async fn process(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2-cents arriving late to the party: I think interpreters are a really powerful abstraction, and having them well documented can take us a long way. I agree, let's come back to naming once everything is in place so we can figure out one that is clear and general enough if this was to be ported to some other code base.

Co-authored-by: adlrocha <6717133+adlrocha@users.noreply.github.com>
@aakoshh aakoshh merged commit aa1a86c into main Aug 22, 2023
5 checks passed
@aakoshh aakoshh deleted the fm-184-proposal-interpreter branch August 22, 2023 09:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal interpreters
3 participants