Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Consider immediate execution #7898

Closed
4 tasks
liamsi opened this issue Jun 17, 2020 · 8 comments
Closed
4 tasks

Proposal: Consider immediate execution #7898

liamsi opened this issue Jun 17, 2020 · 8 comments
Labels
C:abci Component: Application Blockchain Interface C:consensus Component: Consensus S:proposal Status: Proposal stale for use by stalebot

Comments

@liamsi
Copy link
Contributor

liamsi commented Jun 17, 2020

This is a draft (will update shortly):

Summary

Consider enabling immediate execution (either by default, or, somehow make it possible to chose or overwrite the current behaviour for tendermint users).

Problem Definition

Currently, tendermint executes transactions one "block height off". Meaning that In the current execution model, blocks are executed against the app only after they are committed.
Full block verification (incl. state) always needs access to transactions of the previous block:

state(1) = InitialState
state(h+1) <- Execute(state(h), ABCIApp, block(h))

While that seem to be fine for most use-cases there are a few draw backs here:

  1. First of all it's confusing why transactions do not simply get executed in the same block (this is mostly a documentation concern). There is no clear documentation why this decision was made in tendermint. There are a few issues where this was discussed but these discussions are difficult to find and they don't explain the decision well enough.
  2. Dealing with one-offs is a classical source of bugs. That is a concern for app or rather SDK module developers. E.g., a dev who was instrumental in designing and implementing the PoS module in the SDK confirmed that "it actually being quite annoying to deal with the +1 offset". This is mostly about developer usability. It also (unnecessarily?) complicates the app centric point of view (ref: App centric interpretation of concepts #2483).
  3. In the context of IBC, for some zones it might be annoying to essentially "wait" an extra block for the state to actually be updated. Not sure if this is a real issue actually. But I can imagine for some projects that waiting a few extra seconds is at least non-optimal.
  4. For certain fee models, deferred execution is a burden, or makes them hacky/impossible to implement: Further investigate execution celestiaorg/celestia-core#3 (comment)
    and Further investigate execution celestiaorg/celestia-core#3 (comment)

Proposal

First, the reasoning behind the current execution model needs to be documented (my understanding is that it is an optimization to reduce latency; s.t. validators can reach consensus on tx ordering quickly and then do the state transitions leisurely while timeout_commit didn't kick in yet). This should be done independent of the proposal to execute earlier.

TODO

Related:


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@liamsi
Copy link
Contributor Author

liamsi commented Oct 20, 2020

Note that this would likely make solving this issue easier, too: #3322

@tessr
Copy link
Contributor

tessr commented Oct 29, 2020

Evaluating this, and deciding whether we want to switch execution models, is now on the roadmap for the 0.35 milestone. (If we do decide to do this, the implementation may happen as part of the 1.0 milestone, but we'd like to make a decision on it as part of our 0.35 work.)

@tac0turtle tac0turtle transferred this issue from tendermint/tendermint Nov 3, 2020
@tac0turtle tac0turtle added C:abci Component: Application Blockchain Interface C:consensus Component: Consensus S:proposal Status: Proposal labels Nov 3, 2020
@tac0turtle
Copy link
Contributor

Talking with @liamsi today we have come up with a potential design that could make this optional to applications.

With the proposed preprocess change a proposer could execute the txs and provide an app hash. In the next phase (precommit) there is an ABCI call to the application (checkBlock or preCheck) that provides it with the header and data fields of the block. Validators could execute the txs to get the apphash that was added by the proposer. By keeping the Commit phase ABCI calls the two new ABCI calls become optional. If the application would like to continue with delayed execution they can make these ABCI calls noops.

Diagram:
Untitled Diagram

There are implementation details to be considered if this were to be accepted.

@ebuchman
Copy link
Contributor

ebuchman commented Nov 3, 2020

I'm not sure optionality is going to be an option here, without a massive refactor, given how pervasive assumptions about the execution model are, down to the data structures. So I'd think it would probably be all or nothing on this change.

It's hard to imagine that the performance concerns that initially justified delayed execution really hold a candle to the drastic UX issues this has caused, or to the speedups to come later from aggregate sigs. Not to mention possible economic problems. So figuring out how to move forward with immediate execution seems like the right idea.

That said, this problem has a pretty huge surface area, so it will be a lot to work through.

In the diagram above, it looks like Prevote and Precommit are in the wrong order? Also re the complexity of optionality, I suspect if we do this there will be just one block execution, and there's no more excuse for blocks with invalid txs (ie. txs which don't at least "pay gas"). This means after we discover an executed block is invalid, we have to roll it back (note apps already have to support something similar to this, but not at all identical, for checktx and out-of-gas style roll backs)

Presumably the options for when the execution happens are:

  1. After receiving the proposal, before prevoting. This is simplest but seems to add the most latency
  2. After receiving the proposal, before precommiting. This way execution can happen concurrently while we're prevoting and waiting for prevotes from everyone. But this means we might end up with +2/3 prevotes ("polka") for a block that is actually invalid, which may have liveness implications we need to be careful about.

Implementation strategy might support starting with (1) which is easier and then transitioning to (2) at some point once things are further worked out. But everyone would have to bear the latency hit of (1) in the meantime and it could be non-trivial. Note the difference in implementation complexity between these two options should be tiny compared to the difference between delayed and immediate execution.

We have another degree of freedom around whether proposers execute before proposing or at the same time as everyone else - ie. they could propose a block without an app hash, and then everyone fills in the same app hash before precommiting. This doesn't matter much for (1), but for (2) it complicates the relationship between prevotes and precommits (since prevotes wouldnt have the app hash and precommits would), so in the case of (2) probably proposers should execute before proposing.

There's also a question about the off-by-1 of commits, which I believe is independent of execution. Right now canonical commits are in the next block. Maybe this isn't as big a deal since a valid commit is always available once a block is committed, it just may not be canonical.

Of course there's lots of possible downstream performance improvements coming from pipelining and more interplay between mempools, block propogation, and speculative execution. But this can all get pretty complex quick :) . I also wonder sometimes whether a 3rd phase of voting ala Hotstuff might simplify some of this :P

@tac0turtle
Copy link
Contributor

It's hard to imagine that the performance concerns that initially justified delayed execution really hold a candle to the drastic UX issues this has caused, or to the speedups to come later from aggregate sigs. Not to mention possible economic problems. So figuring out how to move forward with immediate execution seems like the right idea.

👍 👍 👍 👍 👍

In the diagram above, it looks like Prevote and Precommit are in the wrong order?

I accidentally swapped them in the diagram.

But everyone would have to bear the latency hit of (1) in the meantime and it could be non-trivial.

This came up in our conversation and we came to a similar conclusion. With various changes Tendermint this latency could be reduced. On top of changes in Tendermint an application could make changes to transaction execution to increase speed which would help as well.

@cmwaters
Copy link
Contributor

cmwaters commented Nov 5, 2020

there is an ABCI call to the application (checkBlock or preCheck) that provides it with the header and data fields of the block.

Just to confirm my understanding here, as well as checking that the app hash is the same, if any of the txs were invalid with respect to the app then the app would also return an error and then the node would prevote/precommit nil? If this is the case what happens if 2/3+ vote for this invalid block? does the node panic?

So I'd think it would probably be all or nothing on this change.

I don't know the details too well but I would tend to agree with this. I feel like we would be too stretched trying to offer both immediate and delayed execution and it also might come across as a bit confusing for developers

A few other things I would like to comment on. If we do execution before consensus is reached and we reach timeout or don't get votes because it is invalid in any way than applications will need to be able to revert all the txs that they just applied. I don't think we have asked apps to do that before right? Usually if a tx is invalid it is just dropped during deliverTx. Also in the case of multiple rounds (which I know very seldom occurs) we will have to do this operation multiple times.

@alexanderbez
Copy link
Contributor

If we do execution before consensus is reached and we reach timeout or don't get votes because it is invalid in any way than applications will need to be able to revert all the txs that they just applied.

Application state is "cached" so to say, it's not persisted or finalized until ABCI#Commit, which I presume we would not call before there is consensus?

Note, invalid txs can and do make it into blocks atm.

@liamsi
Copy link
Contributor Author

liamsi commented Nov 5, 2020

Just to confirm my understanding here, as well as checking that the app hash is the same, if any of the txs were invalid with respect to the app then the app would also return an error, and then the node would prevote/precommit nil? If this is the case what happens if 2/3+ vote for this invalid block? does the node panic?

I guess they would vote nil (as the proposed block wasn't valid in that sense).

If this is the case what happens if 2/3+ vote for this invalid block?

Similarly, that isn't related to the execution model and would require the same measures as currently if 2/3+ votes were collected for an otherwise invalid block. Ideally, a block proposer proposing an invalid block would be slashed imo. Fraud proofs as evidence to be included in the next block could help to enforce slashing a proposer that included invalid state transitions.

As Bez said, the app could execute the Tx and cache the result and on commit, it gets actually applied.

The abci method that would be called (e.g. on propose by the proposer and on before prevoting by the other nodes) should filter out all invalid Tx and return the valid Tx, the app state (apphash), and the pending abci updates. On commit both the app state gets actually updated as well as the abci updates get applied to actually update the tendermint state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:abci Component: Application Blockchain Interface C:consensus Component: Consensus S:proposal Status: Proposal stale for use by stalebot
Projects
None yet
Development

No branches or pull requests

6 participants