-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart after automatic upgrade not working #8265
Comments
Replicated from my comment in Discord: Based on the setup Riccardo mentions, the cosmovisor processes successfully upgrades the chain. He then allows the new chain to mint a few blocks, and then he stops the chain and restarts // Set the store loader for the upgrade to work properly
app.SetStoreLoader(func(ms sdk.CommitMultiStore) error {
return ms.LoadLatestVersionAndUpgrade(&storetypes.StoreUpgrades{
Added: []string{reportsTypes.ModuleName},
})
}) So this code gets executed everytime the app starts. So when Riccardo stops the chain and then restarts at height X, the storeloader will run and reach this line: cosmos-sdk/store/rootmulti/store.go Line 193 in adcecfa
// If it has been added, set the initial version
if upgrades.IsAdded(key.Name()) {
storeParams.initialVersion = uint64(ver) + 1
} So every time the app restarts with the new upgraded binary, we are resetting the initial version of the added store to the
Since the app resets the initial version of the new store everytime it starts up, then when it loads the store, it realizes there is already an earlier version existing from the initial version. The error Riccardo is seeing is all the way from the iavl LoadVersion code: if firstVersion > 0 && firstVersion < int64(tree.ndb.opts.InitialVersion) {
return latestVersion, fmt.Errorf("initial version set to %v, but found earlier version %v",
tree.ndb.opts.InitialVersion, firstVersion)
} The store migration code in This should be addressed before stargate. EDIT: It seems that this is not actually a bug since the |
Setting the StoreLoader inside the upgrade handler@AdityaSripal I tried doing what you suggested, by setting the StoreLoader inside the upgrade handler with the following piece of code: // Register the upgrade handler for the relationships upgrade
app.upgradeKeeper.SetUpgradeHandler("test", func(ctx sdk.Context, plan upgradetypes.Plan) {
// Set the store loader for the upgrade to work properly
app.SetStoreLoader(func(ms sdk.CommitMultiStore) error {
return ms.LoadLatestVersionAndUpgrade(&storetypes.StoreUpgrades{
Added: []string{reportsTypes.ModuleName},
})
})
}) Unluckily, this results in the following error when applying the upgrade:
I also tried setting it using the // Register the upgrade handler for the relationships upgrade
app.upgradeKeeper.SetUpgradeHandler("test", func(ctx sdk.Context, plan upgradetypes.Plan) {
// Set the store loader for the upgrade to work properly
app.SetStoreLoader(upgradetypes.UpgradeStoreLoader(plan.Height, &storetypes.StoreUpgrades{
Added: []string{reportsTypes.ModuleName},
}))
}) but the same error is returned as well. If I would have to take a guess, I would say that this is caused by the fact that once the node starts, the first thing it does is calling this code: if loadLatest {
if err := app.LoadLatestVersion(); err != nil {
tmos.Exit(err.Error())
}
} This causes Since the upgrade handler can only be called once the app has started properly (hence, it has been sealed), this results in the above error being raised. Trying to sorting the error outIn order to verify is the above idea was correct, I commented the following lines inside func (app *BaseApp) SetStoreLoader(loader StoreLoader) {
//if app.sealed {
// panic("SetStoreLoader() on sealed BaseApp")
//}
app.storeLoader = loader
} This resulted in the update to proceed smoothly. I also tried stopping the node and re-starting it, and it worked without a single problem. I have no idea if commenting out those checks might impact somewhere else in the code, but that seemed to solve this bug. If it's the case that you cannot remove those checks, I think that an alternative might be to implement a new method that allows to set the store loader even if the app has already started. |
Nice research there @AdityaSripal and thank you for trying this out @RiccardoM with a very valid error. I tested this in a 0.38 version back in fall 2019... I haven't worked on this code since then. I really wish @aaronc or the Regen crew that added the feature was being more proactive supporting or debugging it. I will look up the code sample from back in the day that tested the upgrade mechanism. |
The original PR adding this functionality is here: https://github.com/cosmos/cosmos-sdk/pull/4724/files (Aug 2019) There was a unit test ensuring this properly migrated the data (not using baseapp): https://github.com/cosmos/cosmos-sdk/pull/4724/files#diff-9ab8b6b1ae348e450f51d4a110e504d9aee67d848a997128a39f914a0acfa7f7R159-R252 The baseapp test did pass the new loader in the BaseApp constructor: https://github.com/cosmos/cosmos-sdk/pull/4724/files#diff-a40b307156ef669dfec87e99a660ab3fe06e89adbe6b5d8861386b4f69bc1ef0R182-R269 I now see we used to store a JSON file with the upgrade/rename info. This was essential, as we could not query the IAVL store to see if we are due for a given upgrade until after we have mounted the store. But we needed to know about the substore migrations (Rename, Delete, Add substores) before mounting the store... The solution @aaronc and I came up with was to dump a JSON file with that upgrade info when we hit the upgrade height, and it would be run when restarting. One issue there, is it requires the old binary to know what changes would happen in the new binary. The other approach, which was based on the cosmosd/cosmovisor solution was to add this upgrade JSON file in the same folder as the new binaries, so it would be detected when the upgrade happened. I have no been involved in this code in ~16 months, and not sure how this thinking or design has advanced over 2020 with all the stargate changes. After reviewing the original code, I do think that some external JSON file is needed to control that. (And the JSON file is deleted after execution, so only run once) I would be very happy for comments by some of the people who have touched this in the last year (thank you to Aditya for your contribution). Here is who was working on the module: https://github.com/cosmos/cosmos-sdk/commits/master/x/upgrade |
@RiccardoM what I described above is also documented here: https://github.com/cosmos/cosmos-sdk/blob/master/x/upgrade/spec/01_concepts.md#storeloader I am not sure if these docs are fully up to date, and it is missing a full example, but should give some hints on how to move forward |
So one small note - originally there was no need to use upgrade store loaders when new stores were added - just for renames and deletions so this was considered really an edge case. The case for "added" stores was added in #7415 - I will study that code a bit to see what other consequences there might be of this. As for how to use upgrade store loaders, I would note that the documentation does give a pretty detailed example: https://pkg.go.dev/github.com/cosmos/cosmos-sdk@v0.40.0-rc6/x/upgrade#hdr-Performing_Upgrades. Maybe there could be some easier place to find this documentation but the godoc for the upgrade module is not an illogical place either. Can you try this again following the example in the docs @RiccardoM ? |
So is the gist of the issue here that nobody will look at godocs and everything needs to be in the spec so it ends up on docs.cosmos.network? Seems like there are a few modules with detailed godocs, but mostly not. I guess maybe we need to really make it a clear convention to use specs and not godocs - I certainly wasn't clear on that when we were working on this. |
I was going through the implementation and there is a check already for upgrade height, https://github.com/cosmos/cosmos-sdk/blob/v0.40.0-rc6/x/upgrade/types/storeloader.go#L13, was wondering how these stores are getting updated on restart everytime. |
Great comment @aaronc And no I hadn't looked at cosmos sdk go doc much, as it had not been a very good place to get information about modules. The READMEs, ADRs and digging in the source were all I really had. I guess this changed over 2020. We can start a re-education campaign to "just read the godoc" 😄 |
@RiccardoM I have tested with |
I'm not saying we should switch over to prioritizing godocs, although that's not a bad idea either. I'd just like to have a really clear convention as to which docs need to be prioritized... as it seems like writing thorough godocs for this wasn't that helpful... |
It just needs to be consistent and communicated. The devs will follow where the info is, if they know where to look |
Summary of Bug
After an automatic upgrade, if you stop the chain at height
X
and re-start it the following error is returned:Version
v0.40.0-rc6
Steps to Reproduce
I discovered this while doing a test to make sure v0.40 does not have this bug.
To test it, I've made two branches of our project:
riccardo/no-module
, which does not have one particular module (reports
) installedriccardo/with-module
which does have such module installedThe rest of the code is identical in the two branches, and is all based on
v0.40.0-rc6
.To test the correct behavior of
x/upgrade
, what I did is the following:no-module
.50
.What happens is that if we stop at height
X
and try to restart, the following error is returned:Following you can find the script that I'm using to easily setup everything properly:
Setup script
To use the above script, you need to setup your
~/Desktop/Desmos
folder accordingly:Setting up Desmos folder
And this is the script I'm using to submit the upgrade proposal:
Upgrade proposal script
These are the environmental variables I've setup for Cosmovisor:
Cosmovisor env vars
Notes
I have not tried to test this without Cosmovisor.
For Admin Use
The text was updated successfully, but these errors were encountered: