-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix!: Remove panics on failure to send IBC packets #876
Changes from 5 commits
05eb431
808b1a0
5fc52ae
277ea26
8eee263
8c64481
d8e74e3
71eccd7
423039f
daf2fa0
53b8f4a
44b55f2
0e99659
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -194,12 +194,17 @@ func (k Keeper) SendVSCPacketsToChain(ctx sdk.Context, chainID, channelID string | |
// IBC client is expired! | ||
// leave the packet data stored to be sent once the client is upgraded | ||
// the client cannot expire during iteration (in the middle of a block) | ||
k.Logger(ctx).Debug("IBC client is expired, cannot send VSC, leaving packet data stored:", "chainID", chainID, "vscid", data.ValsetUpdateId) | ||
k.Logger(ctx).Info("IBC client is expired, cannot send VSC, leaving packet data stored:", "chainID", chainID, "vscid", data.ValsetUpdateId) | ||
return | ||
} | ||
// TODO do not panic if the send fails | ||
// https://github.com/cosmos/interchain-security/issues/649 | ||
panic(fmt.Errorf("packet could not be sent over IBC: %w", err)) | ||
// Not able to send packet over IBC! | ||
k.Logger(ctx).Error("cannot send VSC, removing consumer:", "chainID", chainID, "vscid", data.ValsetUpdateId, "err", err.Error()) | ||
// If this happens, most likely the consumer is malicious; remove it | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mpoke it's not intuitive to me why a failed IBC send would always indicate a malicious consumer. The error being returned could be caused by corrupted IBC state on the provider, where stoping a consumer could make such a scenario worse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's my analysis on the reasons for
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be more conservative, we could just log an error and leave the packets in the queue. Operators need to notice that So we run the risk of nobody noticing for a while, which will affect users. Once we notice, we need to either do an emergency upgrade where we migrate the state (difficult) or we submit a ConsumerRemoval proposal, which would take 2 extra weeks before delegators that unbonded can get their funds back. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tough question. According to the untrusted consumer doctrine, we should not let consumers disrupt the provider. So that would lean towards just removing the consumer. However, the unbonding pausing is somewhat of an exception that we allow, since it is limited by the VSC timeout. Since we already allow the consumer to disrupt the provider up to the VSC timeout just by not sending VSC acks, it would make sense to allow disruption up to the VSC timeout here as well. However, if we don't stop the consumer, is it possible to recover from this state without a provider migration?
Is there any way to change this, so that a MsgChannelCloseConfirm would update the ChainToChannel state? So in conclusion:
|
||
err := k.StopConsumerChain(ctx, chainID, true) | ||
if err != nil { | ||
panic(fmt.Errorf("consumer chain failed to stop: %w", err)) | ||
} | ||
return | ||
} | ||
// set the VSC send timestamp for this packet; | ||
// note that the VSC send timestamp are set when the packets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for
if clienttypes.ErrClientNotActive.Is(err)
is no longer needed since behavior is the same whether that statement is true or not. The type of error will be printedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior is a bit different. For expired clients we log an info message. For other errors, we log an error as something is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But "error" is in the name,
ErrClientNotActive
. I see the logging behavior is slightly different for that error, but imo a semantic error should be logged as an error. Which also makes the code more simple/readable