-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: add a signalling mechanism for coordinated upgrades #2832
Conversation
Co-authored-by: Evan Forbes <42654277+evan-forbes@users.noreply.github.com>
…lestia-app into cal/minor-upgrade-testing
Co-authored-by: Rootul P <rootulp@gmail.com>
Co-authored-by: Rootul P <rootulp@gmail.com>
WalkthroughWalkthroughThe changes involve a significant update to the upgrade module of a blockchain application, introducing new package imports, altering existing functions, and adding new ones. The upgrade logic has been enhanced to handle version upgrades more effectively, with new parameters, tallying logic, and end-block processing. The changes also include the introduction of protocol buffer files for defining gRPC services and messages related to upgrade parameters and version signaling. Changes
TipsChat with CodeRabbit Bot (
|
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Hmm proto-gen is still failing so I can push a commit that tries to resolve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall really great work! I left a few questions to understand more. I think my only blocking feedback pertains to the threshold power calculation. I think it could be problematic if it rounds down.
x/upgrade/keeper.go
Outdated
// upgrade to a new version. It converts the signal quorum parameter which | ||
// is a number between 0 and math.MaxUint32 representing a fraction and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] I thought signalQuorum was a decimal potentially in the range of 0 to 1. How does a number between 0 and math.MaxUint32 get translated into a fraction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it was originally a uint32 but I changed it to use the sdk's decimal struct and forgot to update the documentation. Thanks for pointing it out
store := sdkCtx.KVStore(k.storeKey) | ||
iterator := store.Iterator(nil, nil) | ||
defer iterator.Close() | ||
for ; iterator.Valid(); iterator.Next() { | ||
valAddress := sdk.ValAddress(iterator.Key()) | ||
power := k.stakingKeeper.GetLastValidatorPower(sdkCtx, valAddress) | ||
version := VersionFromBytes(iterator.Value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[extremely optional] wow does Cosmos SDK not provide any convenience methods for iterating over this store which seems like a map from validator address => validator version?
If there aren't any convenience methods, would it make sense to extract methods that enable using basic map operations on this so that the code can look something like:
for valAddress, version := range validatorToVersionMap() {
power := k.stakingKeeper.GetLastValidatorPower(sdkCtx, valAddress)
// ...
}
x/upgrade/module.go
Outdated
} | ||
|
||
const ( | ||
consensusVersion uint64 = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] why 2
instead of 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just directly taken from the existing upgrade module (since it shares the same name). It should probably actually be 3 instead of 2 now that I think of it
// RegisterLegacyAminoCodec registers the upgrade types on the LegacyAmino codec. | ||
func RegisterLegacyAminoCodec(cdc *codec.LegacyAmino) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] does this module need to support legacy Amino codec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so but I'm not entirely sure what legacy amino is for. IIRC, it's needed for ledger signing right? I'm not sure if our other modules support the legacy amino encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Legacy Amino is needed for Ledger but in my experience the Cosmos SDK docs are poor at explaining when modules should implement this and if it's actually deprecated / slated for removal.
x/upgrade/tally_test.go
Outdated
}) | ||
require.NoError(t, err) | ||
require.EqualValues(t, 30, res.VotingPower) | ||
require.EqualValues(t, 100, res.Threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] this usage of Threshold
makes me realize that the variable name ThresholdPower
may make the code clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I can modify that
func (AppModuleBasic) RegisterGRPCGatewayRoutes(clientCtx client.Context, mux *runtime.ServeMux) { | ||
if err := types.RegisterQueryHandlerClient(context.Background(), mux, types.NewQueryClient(clientCtx)); err != nil { | ||
panic(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider handling the error returned by RegisterQueryHandlerClient
without causing a panic, to improve the robustness of the application.
// LegacyQuerierHandler registers a query handler to respond to the module-specific queries | ||
func (am AppModule) LegacyQuerierHandler(_ *codec.LegacyAmino) sdk.Querier { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment for LegacyQuerierHandler
should be updated to indicate that it is a no-op.
- // LegacyQuerierHandler registers a query handler to respond to the module-specific queries
+ // LegacyQuerierHandler is a no-op.
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
// LegacyQuerierHandler registers a query handler to respond to the module-specific queries | |
func (am AppModule) LegacyQuerierHandler(_ *codec.LegacyAmino) sdk.Querier { | |
return nil | |
} | |
// LegacyQuerierHandler is a no-op. | |
func (am AppModule) LegacyQuerierHandler(_ *codec.LegacyAmino) sdk.Querier { | |
return nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Excited for this.
Documenting from a call a future improvement: clearing out the list of validator votes for a version after a successful upgrade. That way shouldUpgrade
doesn't return true for subsequent blocks.
func (k Keeper) GetVotingPowerThreshold(ctx sdk.Context) sdkmath.Int { | ||
quorum := k.SignalQuorum(ctx) | ||
// contract: totalVotingPower should not exceed MaxUit64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[typo]
// contract: totalVotingPower should not exceed MaxUit64 | |
// contract: totalVotingPower should not exceed MaxUint64 |
totalVotingPower := k.stakingKeeper.GetLastTotalPower(ctx) | ||
return quorum.MulInt(totalVotingPower).RoundInt() | ||
thresholdFraction := SignalThreshold(ctx.BlockHeader().Version.App) | ||
return totalVotingPower.MulRaw(thresholdFraction.Numerator).QuoRaw(thresholdFraction.Denominator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to investigate / enforce that this never overflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref: #2878
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type Fraction struct { | ||
Numerator int64 | ||
Denominator int64 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be done later, but could we have a quick doc here for this type since its exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah okay, maybe I just make it private. There's no need to have it public as it's not queryable
|
||
// RegisterLegacyAminoCodec registers the upgrade types on the LegacyAmino codec. | ||
func RegisterLegacyAminoCodec(cdc *codec.LegacyAmino) { | ||
cdc.RegisterConcrete(upgradetypes.Plan{}, "cosmos-sdk/Plan", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might just want to leave a mental note to try out signing the tx on a ledger to make sure that we don't need to register the new Msg type with the legacy codec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref: #2876
Overview
As per ADR-018, this PR extends the existing minimal upgrade module with a signalling mechanism.
Validators are expected to submit an on-chain message to signal that they wish to change version of the network. When a quorum has signalled the next version the upgrade module signals to the app that it is ready to switch to the next state machine. If the app version is not supported the node will panic. Note that this feature does not currently support downgrading. The only permissible app version change is the very next increment. To cancel the upgrade, the same validators need only to submit an on-chain message with the current version they are on.
There are some remaining design decisions that I would appreciate feedback on:
a. Use diffs. Whenever there is a voting power change to a validator we add a hook that fires with the old voting power and the new voting power and the upgrade module listens to this hook and updates the total accordingly. This would require modifying the staking module and may cause problems down the track with needing to maintain the staking fork. It's also more prone to errors.
b. Use epochs. Tally all the signal votes once every 1000 blocks or once a day. There is no need to tally continuously and we spare computation by reducing the frequency that we need to update. Most major upgrades are relatively time agnostic and can be done at any point. Emergency major upgrades operate differently and would involve the network halting temporarily.
Some remaining work that will be done in follow up PRs:
Checklist