Panics on failure to send IBC packets #731
Labels
source: audit
To indicate an issue found during an audit.
type: bug
Issues that need priority attention -- something isn't working
Surfaced from @informalsystems audit on Interchain Security, at commit 463ec20
Problem
Currently, both provider and consumer chains would panic and halt if sending of IBC packets fails for the reason different than
ErrClientNotActive
. This problem is known to the team (see #649), and no correcting actions are needed before the ICS 1.0 release, but the problem needs to be alleviated before ICS 1.1 release (before the first consumer chain goes live).Closing criteria
Do not panic on failure to send IBC packets.
Problem details
In the implementation under audit, namely in the functions SendVSCPacketsToChain() on provider, and SendPackets() on consumer, the code panics if sending of IBC packets fails. As these functions are called from EndBlockers, the respective chain will halt on error. This has indeed happened in a testnet, when sending of an IBC packet failed because of an expired LightClient (see issue ICS#435). A partial fix has been introduced, in which the packets are first queued, and only then sent, with errors about light client expiration are logged, and the unsent packets remain queued. The resulting code looks similar on both provider and consumer; here is the relevant part of the provider code:
As can be seen, upon receiving an error from sending an IBC packet, if the error is of one specific type (
ErrClientNotActive
), the function stops processing; on any other returned error the function will panic, and thus halt the provider. The problem with the presented approach is that it introduces an implicit assumption on the behavior of code that sends IBC packets, namely that the only error that it may return under normal circumstances isErrClientNotActive
. But even if this is true currently (though normal circumstances is hard to define), it can be seen that the function ibc-go/modules/core/04-channel/keeper/packet.go:SendPacket(): a) contains at least 10 different errors it may return, and b) is updated relatively frequently, so even if the above condition holds now, it may seize to hold upon the next IBC update.Problem Scenarios
If, under some combination of parameters, or upon an IBC update, the IBC packet send returns an unexpected error, the provider or consumer chain will halt. The crux of the problem is in the immediate and harmful reaction to a problem that may have transient character.
Recommendation
In general, we recommend following the modular approach outlined in the issue ICS#627. Considering this specific case, the situation is much simplified by the fact that all infrastructure for delayed actions is already in place: IBC packets are queued both on the provider and on the consumer, and even if the packet can't be sent now, it can be resent later. Thus, we recommend the following:
ErrClientNotActive
In that way, no immediate harmful actions will be taken wrt. either provider or consumer, and the problem will be delegated to humans for further actions; the ICS packets will then simply wait in the respective queues until the correcting actions are taken.
TODOs
The text was updated successfully, but these errors were encountered: