Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further hotfix on panic when unbonding unbonded validator #75

Merged
merged 2 commits into from
May 27, 2022

Conversation

nnkken
Copy link
Contributor

@nnkken nnkken commented May 27, 2022

Also guard the case of validator not found (since it could be already removed in previous unbonding).
Also add comments.

Related SDK commit: likecoin/cosmos-sdk@6cd3d08

The cause of the problem:

  1. a validator (cosmos1abcd...) was unbonding before Bech32 address prefix migration
  2. after prefix migration, the validator was further jailed
  3. the SDK code tries to remove the old unbonding entry and insert a new one
  4. however, since the validator address for the unbonding entry was stored in string format, and the code for removing the unbonding entry is searching for the new Bech32 prefix address, the old entry was not removed
  5. so there are 2 unbonding entries for the same validator
  6. the first unbonding entry triggers, turning the validator status to unbonded
  7. the second unbonding entry triggers, and found that the validator was already unbonded, so the SDK code panic

Currently we are guarding the unbonding code so it won't panic if validator was already unbonded or removed, but leave an error log instead.

@nnkken nnkken requested review from rickmak, williamchong and a team May 27, 2022 07:09
@elise-ng
Copy link
Contributor

@nnkken L440 should also continue? 🤔

@nnkken
Copy link
Contributor Author

nnkken commented May 27, 2022

@nnkken L440 should also continue? 🤔

I was keeping that because structurally it is possible to run k.RemoveValidator(ctx, val.GetOperator()).
But logically speaking it could simply continue.
But since we are fixing a bug on the logic itself... I think we should better keep it?

@elise-ng
Copy link
Contributor

@nnkken L440 should also continue? 🤔

I was keeping that because structurally it is possible to run k.RemoveValidator(ctx, val.GetOperator()). But logically speaking it could simply continue. But since we are fixing a bug on the logic itself... I think we should better keep it?

if we continue at L440, we can revert L446-448 back to the vanilla sdk code, seems the code will look cleaner this way; so:

if !found {
	// due to address Bech32 prefix migration, it is possible that entry with address with old prefix was not
	// deleted (e.g. jailed during unbond), so we don't panic here and instead just skip this validator
	ctx.Logger().Error("validator in the unbonding queue was not found", "validator", valAddr)
	continue
}

if !val.IsUnbonding() {
	if val.IsUnbonded() {
		// same issue as the comments above
		ctx.Logger().Error("unbonding validator but the status was unbonded", "validator", valAddr)
		continue
	}
	panic("unexpected validator in unbonding queue; status was not unbonding or unbonded")
}

val = k.UnbondingToUnbonded(ctx, val)
if val.GetDelegatorShares().IsZero() {
	k.RemoveValidator(ctx, val.GetOperator())
}

@rickmak
Copy link
Collaborator

rickmak commented May 27, 2022

According to our discussion and the clause you wrote here. 5. so there are 2 unbonding entries for the same validator. The first one should already did the job. So logically if we discover the validator no longer in the transiting stage but transited state, we should skip the following action via continue.

Same logic for not found (already delete) and unbound (already transit of state).

@rickmak
Copy link
Collaborator

rickmak commented May 27, 2022

Oh, github down resulting my previously comment is later than @chihimng one. 🗡️

@nnkken
Copy link
Contributor Author

nnkken commented May 27, 2022

PR updated. The SDK commit now is likecoin/cosmos-sdk@6cd3d08.

@nnkken
Copy link
Contributor Author

nnkken commented May 27, 2022

The new SDK commit (v0.44.8-dual-prefix-hotfix-3) combined the previous commits into one commit for better review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants