Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart after automatic upgrade not working #8265

Closed
4 tasks
RiccardoM opened this issue Jan 6, 2021 · 13 comments
Closed
4 tasks

Restart after automatic upgrade not working #8265

RiccardoM opened this issue Jan 6, 2021 · 13 comments
Assignees
Labels
C:x/upgrade T:Docs Changes and features related to documentation.

Comments

@RiccardoM
Copy link
Contributor

Summary of Bug

After an automatic upgrade, if you stop the chain at height X and re-start it the following error is returned:

failed to load latest version: failed to load store: initial version set to X+1, but found earlier version 50
exit status 1

Version

v0.40.0-rc6

Steps to Reproduce

I discovered this while doing a test to make sure v0.40 does not have this bug.

To test it, I've made two branches of our project:

The rest of the code is identical in the two branches, and is all based on v0.40.0-rc6.

To test the correct behavior of x/upgrade, what I did is the following:

  1. Start a chain based on the code present inside no-module.
  2. Submit an upgrade proposal for height 50.
  3. Wait until the height is reached and the upgrade is performed by Cosmovisor.
  4. Stop the new chain some blocks after the upgrade
  5. Restart the new chain

What happens is that if we stop at height X and try to restart, the following error is returned:

failed to load latest version: failed to load store: initial version set to X+1, but found earlier version 50
exit status 1

Following you can find the script that I'm using to easily setup everything properly:

Setup script
#!/bin/bash
shopt -s expand_aliases

echo "===> Deleting previous data"
rm -r -f $HOME/.desmosd

echo "===> Setting up upgrade manager"
mkdir -p ~/.desmosd/cosmovisor/genesis/bin
cp ~/Desktop/Desmos/Genesis/desmosd ~/.desmosd/cosmovisor/genesis/bin

mkdir -p ~/.desmosd/cosmovisor/upgrades/test/bin
cp ~/Desktop/Desmos/Upgrade/desmosd ~/.desmosd/cosmovisor/upgrades/test/bin

echo "===> Initializing chain"
cosmovisor init testchain --chain-id testchain

echo "===> Editing genesis.json"
sed -i 's/stake/udaric/g' $HOME/.desmosd/config/genesis.json
sed -i 's/"voting_period": "172800s"/"voting_period": "120s"/g' \
  ~/.desmosd/config/genesis.json

echo "===> Creating genesis accounts"
cosmovisor add-genesis-account jack 100000000000000000000udaric
cosmovisor add-genesis-account desmos16f9wz7yg44pjfhxyn22kycs0qjy778ng877usl 10000000udaric

echo "===> Collecting genesis trasanctions"
cosmovisor gentx jack 1000000000udaric --amount 10000000udaric --chain-id testchain
cosmovisor collect-gentxs

echo "===> Starting chain"
cosmovisor start

To use the above script, you need to setup your ~/Desktop/Desmos folder accordingly:

Setting up Desmos folder
mkdir -p ~/Desktop/Desmos/Genesis/
git checkout riccardo/no-module
make build 
cp build/* ~/Desktop/Desmos/Genesis/

mkdir -p ~/Desktop/Desmos/Upgrade
git checkout riccardo/with-module
make build
cp build/* ~/Desktop/Desmos/Upgrade/

And this is the script I'm using to submit the upgrade proposal:

Upgrade proposal script
#!/bin/bash
shopt -s expand_aliases

echo "===> Submitting governance proposal"
cosmovisor tx gov submit-proposal software-upgrade "test" \
  --title "test software upgrade proposal" \
  --description "something about the proposal here" \
  --deposit 100000000udaric \
  --upgrade-height=50 \
  --from jack \
  --chain-id testchain --yes

echo "===> Waiting for transaction to be effective"
sleep 15

echo "===> Voting governance proposal"
cosmovisor tx gov vote 1 yes --from jack --yes --trace --chain-id testchain

These are the environmental variables I've setup for Cosmovisor:

Cosmovisor env vars
# Setup Cosmovisor
DAEMON_NAME=desmosd
DAEMON_HOME=/home/riccardo/.desmosd
DAEMON_RESTART_AFTER_UPGRADE=true

Notes

I have not tried to test this without Cosmovisor.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@AdityaSripal
Copy link
Member

AdityaSripal commented Jan 6, 2021

Replicated from my comment in Discord:

Based on the setup Riccardo mentions, the cosmovisor processes successfully upgrades the chain. He then allows the new chain to mint a few blocks, and then he stops the chain and restarts
of course when he restarts, all the code in app initialization will also get re-executed
In his upgraded binary, he has the following code:

// Set the store loader for the upgrade to work properly
app.SetStoreLoader(func(ms sdk.CommitMultiStore) error {
    return ms.LoadLatestVersionAndUpgrade(&storetypes.StoreUpgrades{
        Added: []string{reportsTypes.ModuleName},
    })
})

So this code gets executed everytime the app starts. So when Riccardo stops the chain and then restarts at height X, the storeloader will run and reach this line:

if upgrades.IsAdded(key.Name()) {

// If it has been added, set the initial version
if upgrades.IsAdded(key.Name()) {
    storeParams.initialVersion = uint64(ver) + 1
}

So every time the app restarts with the new upgraded binary, we are resetting the initial version of the added store to the latest version+1
This makes sense why Riccardo was getting the error:

failed to load latest version: failed to load store: initial version set to X+1, but found earlier version 50
exit status 1

Since the app resets the initial version of the new store everytime it starts up, then when it loads the store, it realizes there is already an earlier version existing from the initial version.

The error Riccardo is seeing is all the way from the iavl LoadVersion code:
https://github.com/cosmos/iavl/blob/master/mutable_tree.go#L356

if firstVersion > 0 && firstVersion < int64(tree.ndb.opts.InitialVersion) {
    return latestVersion, fmt.Errorf("initial version set to %v, but found earlier version %v",
        tree.ndb.opts.InitialVersion, firstVersion)
}

The store migration code in rootmultistore definitely seems wrong to me. The upgrades are applied every single time the app starts, rather than just once. This code needs to be changed such that the upgrade code is not called on app initialization OR the code must check if the upgrade has already been applied before applying the upgrades. This will ensure the upgrade store migration changes only happen once during the upgrade, and not after subsequent restarts.

This should be addressed before stargate.

EDIT: It seems that this is not actually a bug since the LoadVersionAndUpgrade functions are intended to only be called inside the upgrade handler and thus are intended to only be executed once. This should be very clearly documented in the godocs, as it currently is not. Users who do not do this will face the issue that @RiccardoM did above. I think some documentation before tagging final release would be helpful but not strictly blocking a release.
I also think that someone on core sdk team familiar with cosmovisor should explicitly test the automatic upgrade feature with a live test chain one last time before shipping since this has been a common source of confusion, we should ensure that the process works and there is a clear workflow documented.
cc: @ethanfrey @zmanian

@amaury1093 amaury1093 added this to the v0.40 [Stargate] milestone Jan 6, 2021
@AdityaSripal AdityaSripal added T:Docs Changes and features related to documentation. and removed T:Bug labels Jan 6, 2021
@RiccardoM
Copy link
Contributor Author

Setting the StoreLoader inside the upgrade handler

@AdityaSripal I tried doing what you suggested, by setting the StoreLoader inside the upgrade handler with the following piece of code:

// Register the upgrade handler for the relationships upgrade
app.upgradeKeeper.SetUpgradeHandler("test", func(ctx sdk.Context, plan upgradetypes.Plan) {
  // Set the store loader for the upgrade to work properly
  app.SetStoreLoader(func(ms sdk.CommitMultiStore) error {
    return ms.LoadLatestVersionAndUpgrade(&storetypes.StoreUpgrades{
      Added: []string{reportsTypes.ModuleName},
    })
  })
})

Unluckily, this results in the following error when applying the upgrade:

applying upgrade "test" at height: 50
panic: SetStoreLoader() on sealed BaseApp

I also tried setting it using the UpgradeStoreLoader method:

// Register the upgrade handler for the relationships upgrade
app.upgradeKeeper.SetUpgradeHandler("test", func(ctx sdk.Context, plan upgradetypes.Plan) {
  // Set the store loader for the upgrade to work properly
  app.SetStoreLoader(upgradetypes.UpgradeStoreLoader(plan.Height, &storetypes.StoreUpgrades{
    Added: []string{reportsTypes.ModuleName},
  }))
})

but the same error is returned as well.

If I would have to take a guess, I would say that this is caused by the fact that once the node starts, the first thing it does is calling this code:

if loadLatest {
  if err := app.LoadLatestVersion(); err != nil {
    tmos.Exit(err.Error())
  }
}

This causes LoadLatestVersion to call app.init(), which calls app.Seal() and seals the application, making it impossible to change the StoreLoader once it has started.

Since the upgrade handler can only be called once the app has started properly (hence, it has been sealed), this results in the above error being raised.

Trying to sorting the error out

In order to verify is the above idea was correct, I commented the following lines inside baseapp/options.go:

func (app *BaseApp) SetStoreLoader(loader StoreLoader) {
  //if app.sealed {
  //	panic("SetStoreLoader() on sealed BaseApp")
  //}

  app.storeLoader = loader
}

This resulted in the update to proceed smoothly. I also tried stopping the node and re-starting it, and it worked without a single problem.

I have no idea if commenting out those checks might impact somewhere else in the code, but that seemed to solve this bug.

If it's the case that you cannot remove those checks, I think that an alternative might be to implement a new method that allows to set the store loader even if the app has already started.

CC @ethanfrey @zmanian

@ethanfrey
Copy link
Contributor

Nice research there @AdityaSripal and thank you for trying this out @RiccardoM with a very valid error.

I tested this in a 0.38 version back in fall 2019... I haven't worked on this code since then. I really wish @aaronc or the Regen crew that added the feature was being more proactive supporting or debugging it. I will look up the code sample from back in the day that tested the upgrade mechanism.

@anilcse anilcse self-assigned this Jan 7, 2021
@ethanfrey
Copy link
Contributor

The original PR adding this functionality is here: https://github.com/cosmos/cosmos-sdk/pull/4724/files (Aug 2019)

There was a unit test ensuring this properly migrated the data (not using baseapp): https://github.com/cosmos/cosmos-sdk/pull/4724/files#diff-9ab8b6b1ae348e450f51d4a110e504d9aee67d848a997128a39f914a0acfa7f7R159-R252

The baseapp test did pass the new loader in the BaseApp constructor: https://github.com/cosmos/cosmos-sdk/pull/4724/files#diff-a40b307156ef669dfec87e99a660ab3fe06e89adbe6b5d8861386b4f69bc1ef0R182-R269

I now see we used to store a JSON file with the upgrade/rename info. This was essential, as we could not query the IAVL store to see if we are due for a given upgrade until after we have mounted the store. But we needed to know about the substore migrations (Rename, Delete, Add substores) before mounting the store...

The solution @aaronc and I came up with was to dump a JSON file with that upgrade info when we hit the upgrade height, and it would be run when restarting. One issue there, is it requires the old binary to know what changes would happen in the new binary. The other approach, which was based on the cosmosd/cosmovisor solution was to add this upgrade JSON file in the same folder as the new binaries, so it would be detected when the upgrade happened.

I have no been involved in this code in ~16 months, and not sure how this thinking or design has advanced over 2020 with all the stargate changes. After reviewing the original code, I do think that some external JSON file is needed to control that. (And the JSON file is deleted after execution, so only run once)

I would be very happy for comments by some of the people who have touched this in the last year (thank you to Aditya for your contribution). Here is who was working on the module: https://github.com/cosmos/cosmos-sdk/commits/master/x/upgrade

@AdityaSripal AdityaSripal added T:Bug and removed T:Docs Changes and features related to documentation. labels Jan 7, 2021
@ethanfrey
Copy link
Contributor

@RiccardoM what I described above is also documented here: https://github.com/cosmos/cosmos-sdk/blob/master/x/upgrade/spec/01_concepts.md#storeloader

I am not sure if these docs are fully up to date, and it is missing a full example, but should give some hints on how to move forward

@aaronc
Copy link
Member

aaronc commented Jan 7, 2021

So one small note - originally there was no need to use upgrade store loaders when new stores were added - just for renames and deletions so this was considered really an edge case. The case for "added" stores was added in #7415 - I will study that code a bit to see what other consequences there might be of this.

As for how to use upgrade store loaders, I would note that the documentation does give a pretty detailed example: https://pkg.go.dev/github.com/cosmos/cosmos-sdk@v0.40.0-rc6/x/upgrade#hdr-Performing_Upgrades. Maybe there could be some easier place to find this documentation but the godoc for the upgrade module is not an illogical place either.

Can you try this again following the example in the docs @RiccardoM ?

@aaronc
Copy link
Member

aaronc commented Jan 7, 2021

So is the gist of the issue here that nobody will look at godocs and everything needs to be in the spec so it ends up on docs.cosmos.network? Seems like there are a few modules with detailed godocs, but mostly not. I guess maybe we need to really make it a clear convention to use specs and not godocs - I certainly wasn't clear on that when we were working on this.

@anilcse
Copy link
Collaborator

anilcse commented Jan 7, 2021

I was going through the implementation and there is a check already for upgrade height, https://github.com/cosmos/cosmos-sdk/blob/v0.40.0-rc6/x/upgrade/types/storeloader.go#L13, was wondering how these stores are getting updated on restart everytime.

@ethanfrey
Copy link
Contributor

Great comment @aaronc

And no I hadn't looked at cosmos sdk go doc much, as it had not been a very good place to get information about modules. The READMEs, ADRs and digging in the source were all I really had.

I guess this changed over 2020. We can start a re-education campaign to "just read the godoc" 😄

@anilcse
Copy link
Collaborator

anilcse commented Jan 7, 2021

@RiccardoM I have tested with correct store-migrations logic and everything is working fine. I made a PR onto your branch: desmos-labs/desmos#337, please test and feel free to close this issue.

@aaronc
Copy link
Member

aaronc commented Jan 7, 2021

Great comment @aaronc

And no I hadn't looked at cosmos sdk go doc much, as it had not been a very good place to get information about modules. The READMEs, ADRs and digging in the source were all I really had.

I guess this changed over 2020. We can start a re-education campaign to "just read the godoc" 😄

I'm not saying we should switch over to prioritizing godocs, although that's not a bad idea either. I'd just like to have a really clear convention as to which docs need to be prioritized... as it seems like writing thorough godocs for this wasn't that helpful...

@aaronc aaronc added T:Docs Changes and features related to documentation. and removed backport/0.40.x (Stargate) T:Bug labels Jan 7, 2021
@ethanfrey
Copy link
Contributor

It just needs to be consistent and communicated. The devs will follow where the info is, if they know where to look

@RiccardoM
Copy link
Contributor Author

@anilcse Thank you for the PR, which indeed fixed the problem. Looks like #8276 is also closed by using that solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:x/upgrade T:Docs Changes and features related to documentation.
Projects
None yet
Development

No branches or pull requests

7 participants