Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBC upgrade plan summary #445

Closed
cwgoes opened this issue Jul 5, 2020 · 13 comments
Closed

IBC upgrade plan summary #445

cwgoes opened this issue Jul 5, 2020 · 13 comments
Assignees
Labels
brainstorming Open-ended brainstorming. tao Transport, authentication, & ordering layer.

Comments

@cwgoes
Copy link
Contributor

cwgoes commented Jul 5, 2020

This is a high-level summary of my current thoughts. If all are concurred on executing this for 1.0, I'll update the SDK-side ADR with particular data structure information etc.

This document addresses the procedures for maintaining IBC functionality with minimal disruption to applications when upgrading a ledger which utilises IBC to send & receive messages from other ledgers. Our primary design goal is to minimise/eliminate disruptions to or manual interventions required by applications running on top of the ledger which use IBC to communicate with other applications on other ledgers. For now, we address only the case of a single ledger being upgraded. More complex multi-ledger upgrades should be reducable to sequenced instances of this process.

Basic assumptions:

  • We deal with a single ledger, which wants to upgrade, and a set of connected ledgers which have a light client to the upgrading ledger and possibly open connections/channels.
  • All state is preserved through an upgrade.
    • If this does not happen, this is an irregular upgrade and the IBC protocol cannot reason about what may occur.
  • We operate only at the light client layer. Application-layer protocols can implement their own upgrade procedures for channels or assets tied to channels; this is up to the applications and out of scope of this document.

Upgrades can be classified into three categories:

  1. Pre-planned, non-light-client breaking
    • This kind of upgrade is pre-planned, so the validator set of the ledger can signal in advance, and it does not require upgrading the light client algorithm on the connected chains.
    • For example, this category includes halt-restart upgrades which may change the chain identifier and possibly reset the height.
    • This upgrade path will be supported completely in-protocol by IBC 1.0.
    • The procedure for enacting such an upgrade without disruption is as follows:
      1. The upgrading ledger, prior to executing the upgrade, will set a key-value path in the store to indicate the new chain ID, new height, and upgrade height
      2. Connected ledgers will verify a proof of that key-value path at the last height before the upgrade and update their light clients with the new metadata
      3. Connections & channels do not require any changes, and timeouts can be provided continuity with epoch numbers (Add epoch number to Tendermint light client, alter block height representation #439)
  2. Pre-planned, light-client breaking
    • For example, this category includes changing Tendermint header structure or encoding in a way such that new headers would not be supported by the old light client algorithm.
    • This upgrade path will be supported partially in-protocol by IBC 1.0.
    • The procedure for enacting such an upgrade without disruption is as follows:
      1. The connected ledgers upgrade themselves to add a new light client algorithm which supports the new version (e.g. new header encoding)
      2. The upgrading ledger, prior to executing the upgrade, will set a key-value path in the store to indicate the client type & complete client state initialisation information for the new light client
      3. Connected ledgers will verify a proof of that key-value path at the last height before the upgrade, and replace in-place the current light client with a new one with the specified initialisation parameters
      4. Connections & channels do not require any changes, and timeouts can be provided contuinity with epoch numbers
  3. Not pre-planned, possibly light-client breaking
    1. This category includes upgrades where the upgrading ledger cannot coordinate with connected ledgers prior to executing the upgrade and does not set any upgrade signaling values in state.
    2. For example, this could be an emergency upgrade where the upgrading ledger halts due to a bug and state must be manually exported and the ledger restarted.
    3. This upgrade path will not be supported in-protocol by IBC 1.0, because it is not possible to support automatically in a useful way.
    4. The upgrading ledger & connected ledgers must execute irregular state changes (e.g. via governance) in order to maintain connection & channel contuinity.
@cwgoes cwgoes added tao Transport, authentication, & ordering layer. brainstorming Open-ended brainstorming. labels Jul 5, 2020
@AdityaSripal
Copy link
Member

I like the categorization of the different types of upgrades.

A quick sketch of an idea for solving 2 and 3 in IBC 1.0. (Note, this is just a rough idea, it could be unsafe or incorrect)

Switching the light client algorithm in-place would mean that the unprocessed packets that relied on the old algorithm can no longer be processed. Unless the client implements some switching logic to internally swap light client algorithms based on height (this seems ugly)

Proposal Sketch:

Client changes:
Instead a client may be fixed to a particular light client algorithm, but we introduce a StartHeight and an OutdatedHeight to denote the height range for which this particular client is valid for the chain.

Connection changes:
A ConnectionEnd may have []Client instead of the current single client. We enforce that the clients strung together encompass a continuous, non-overlapping range of heights. Thus for any given height, there is a single corresponding client.
Any proof verification the connection must do will be sent to the appropriate client based on the height of the proof.

The Connection package can then introduce message type(s) for appending a client to this list.
In the preplanned case, this would be a proof of the upgrade height being set and the next client-type. The handler would set the OutdatedHeight of the latest client, and allow the next client to be created and appended to the list if it meets the expected specification.
Note: we also enforce that the next client is state-preserving
In the unplanned case, we could allow for a special message type signed by the last validator-set of the latest client to set the outdated height along with the next client's parameters.

There probably needs to be a way to show and handle misbehaviour if the validator-sets of a previous client height-range are signing different headers for a height-range on the latest client. (ie We may need forking still freezes clients even if the misbehaviour is only detectable across the different light clients)
semi-related issue: cosmos/cosmos-sdk#6531

There are no channel changes necessary.


This seems like it should work for both 2 and 3 (and 1 if a cleaner implementation does not exist). It also allows previous clients to coexist with later clients.

@cwgoes
Copy link
Contributor Author

cwgoes commented Jul 7, 2020

Another major question (ref cosmos/cosmos-sdk#6531) is upgrade security - will Tendermint accept evidence from the old chain within the unbonding period after an upgrade? If not, this potentially opens a large attack vector for double-signing to fool IBC light clients without punishment, we'd probably need to freeze light clients an unbonding period's worth of time before an upgrade.

@cwgoes
Copy link
Contributor Author

cwgoes commented Jul 7, 2020

Switching the light client algorithm in-place would mean that the unprocessed packets that relied on the old algorithm can no longer be processed. Unless the client implements some switching logic to internally swap light client algorithms based on height (this seems ugly)

Why would this be the case? New proofs might have to be created, but the packets should still be in the state (we can assume that the application state is persisted through the upgrade).

@AdityaSripal
Copy link
Member

Why would this be the case? New proofs might have to be created, but the packets should still be in the state (we can assume that the application state is persisted through the upgrade).

Ahh right this is correct. Still seems like the above proposal might make sense to implement for upgrade security. From what I understand there are two types of misbehaviour we might want to avoid:

  1. Fork before UpgradeHeight

As mentioned here:

will Tendermint accept evidence from the old chain within the unbonding period after an upgrade? If not, this potentially opens a large attack vector for double-signing to fool IBC light clients without punishment, we'd probably need to freeze light clients an unbonding period's worth of time before an upgrade.

There may be a case where the chain forks right before the upgrade, but in a way that is detectable only after the upgrade occurs. If so, updating the light client in-place may make detecting and punishing this misbehavour impossible (especially if the update is light-client breaking), since evidence detection and handling is dependent on things like the light client algorithm and header format.

However, consider the proposal above where the old client still exists, but is simply supplanted by a newer client in the list of []Client. Here, the evidence can still be submitted to the older client which can process it just fine (regardless of the newer light-client algorithm). If the evidence passes, the client freezes at the evidence height and then all subsequent Clients in the list can be frozen at their first height.

This effectively allows evidence to be processed by the IBC client and have all subsequent heights to be frozen and unprocessable even across light-client breaking upgrades without any updates to Tendermint's evidence handling logic to handle this case. Note that we also do not need to freeze the light client for three weeks waiting for the old light-client's unbonding period to expire, this avoids a potential major UX pain in more popular connections.

Of course, if all full-nodes, upgrade to the new chain before evidence is caught then the evidence will never get submitted. So a safer way to upgrade might be to leave some full nodes on the old-chain for the duration of the unbonding period. These nodes will not receive any new blocks, but their evidence reactors should still be capable of receiving and gossipping evidence. From my understanding, a relayer can then pick up this evidence and relay it to connecting chains.

  1. Fork on UpgradeHeight

Not sure if this is a situation we are concerned about. Suppose there is a fork at the upgrade height, such that the validator sets runs the new chain and continues operating the old chain past the upgrade height.

It's unclear to me if any light-client is fooled in this situation since the light-clients should know the upgrade height and new parameters ahead of time (since supported upgrades are pre-planned). However, it may still be a misbehaviour we want to punish.

In this case, a full node tracking the old chain post-upgrade (a strategy repeated from the previous case) can still submit a header for the old chain to the older client. Since the old client has an OutdatedHeight set, any header that passes verification but is above OutdatedHeight will cause all later clients to freeze.


The proposal described here #445 (comment) should handle and punish these forks correctly.

The only necessary addition to the proposal described is the addition of the fields to ClientState:

type ClientState struct {
// old fields
...
PrevClient string // empty string if this is first client of chain
NextClient string // empty string if this is latest client of chain
}

This effectively creates a doubly-linked list of clients in the order of when they are active. This is useful to establish the ordering even at the client level (rather than just at the connection), and it helps when the next client needs to be frozen based on a misbehaviour caught on previous client.

@cwgoes
Copy link
Contributor Author

cwgoes commented Jul 8, 2020

This effectively allows evidence to be processed by the IBC client and have all subsequent heights to be frozen and unprocessable even across light-client breaking upgrades without any updates to Tendermint's evidence handling logic to handle this case. Note that we also do not need to freeze the light client for three weeks waiting for the old light-client's unbonding period to expire, this avoids a potential major UX pain in more popular connections.

Of course, if all full-nodes, upgrade to the new chain before evidence is caught then the evidence will never get submitted. So a safer way to upgrade might be to leave some full nodes on the old-chain for the duration of the unbonding period. These nodes will not receive any new blocks, but their evidence reactors should still be capable of receiving and gossipping evidence. From my understanding, a relayer can then pick up this evidence and relay it to connecting chains.

I think what you're proposing makes sense for IBC light clients, but we still need Tendermint to be able to process evidence within the unbonding period in order to slash validators, which IBC light clients cannot do - if this doesn't happen, validators have no disincentive discouraging them from signing forks.

Not sure if this is a situation we are concerned about. Suppose there is a fork at the upgrade height, such that the validator sets runs the new chain and continues operating the old chain past the upgrade height.

I think whether or not this needs to count as misbehaviour to retain the same security assumptions depends on how far in advance the upgrade is known about - if it is known about at least an unbonding period in advance, the light client can definitely reject any heights in the original epoch beyond the upgrade height, but if it is not known at least an unbonding period in advance, such signatures would need to be treated as misbehaviour and slashed for.

@AdityaSripal
Copy link
Member

I think what you're proposing makes sense for IBC light clients, but we still need Tendermint to be able to process evidence within the unbonding period in order to slash validators, which IBC light clients cannot do - if this doesn't happen, validators have no disincentive discouraging them from signing forks.

Correct, what I'm proposing is only a way to freeze light clients in the case of this behaviour which would be necessary for IBC. And leaving the problem of slashing individual validators as an open-problem that is more Tendermint's concern

@cwgoes
Copy link
Contributor Author

cwgoes commented Jul 9, 2020

I think what you're proposing makes sense for IBC light clients, but we still need Tendermint to be able to process evidence within the unbonding period in order to slash validators, which IBC light clients cannot do - if this doesn't happen, validators have no disincentive discouraging them from signing forks.

Per discussion with the Tendermint team, the state of affairs:

  • Tendermint does not support processing old evidence after a restart. This is possible in the future, but will not be supported in 0.34.
  • With the upgrade module, the SDK can update the state machine in a non-zero-height upgrade, without interfering with any Tendermint data - this kind of upgrade does not require any special provisions from IBC at all.

Assuming state machine upgrades always use the upgrade module, this leaves us with Tendermint breaking-upgrades as the remaining issue. Our plan is as follows: either Tendermint Core, in the first post-0.34 release which breaks past data structures (and thus requires a zero-height upgrade), will support processing past evidence (rendering this upgrade path safe), or in that exceptional upgrade case, we will freeze all IBC channels an unbonding period prior to the upgrade (which must be known about at least an unbonding period in advance), which would be unfortunate but not the end of the world.

In terms of immediate IBC work, then, we still need to support this upgrade path (since it might be exercised in the future if Tendermint supports old evidence processing), and we also need to implement the "freeze for an unbonding period" logic, which we need to think about a bit more carefully to ensure that it is safe (as it's not just that validators could sign fake headers to prevent packets from being sent which were actually sent, they could also sign fake headers to send packets which weren't actually sent). I think it needs to work something like:

  • Light client is notified of upcoming upgrade in a way which authorises a specific validator set & parameters (e.g. chain ID)
  • When upgrade is an unbonding period away, light client freezes & accepts no more updates from that chain
  • After the unbonding period, light client will accept only a signature by the authorised validator set & parameters to start validating the upgraded version (after which normal light client behaviour resumes)

@cwgoes
Copy link
Contributor Author

cwgoes commented Jul 21, 2020

We should add the light client freeze operation to the spec.

@cwgoes
Copy link
Contributor Author

cwgoes commented Aug 31, 2020

I suppose we should also include the chain ID epoch extraction scheme in the spec, although it's a bit of temporary hack.

@ValarDragon
Copy link

Just to confirm, eventually this will not live in the chain-id, and just be data the validators sign over and client txs don't?

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 15, 2020

Just to confirm, eventually this will not live in the chain-id, and just be data the validators sign over and client txs don't?

In principle, it should be; not sure what the Tendermint team's plans are.

@cwgoes
Copy link
Contributor Author

cwgoes commented Nov 16, 2020

Implemented in the Cosmos SDK.

Future updates should be made to simplify this logic for Tendermint light clients, but those are implementation choices.

@danwt
Copy link
Contributor

danwt commented Sep 10, 2024

Hi, are there any plans too support emergency upgrade (3)?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming Open-ended brainstorming. tao Transport, authentication, & ordering layer.
Projects
Status: No status
Development

No branches or pull requests

4 participants