Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: cannot delete latest saved version #14625

Closed
k-yang opened this issue Jan 15, 2023 · 6 comments
Closed

panic: cannot delete latest saved version #14625

k-yang opened this issue Jan 15, 2023 · 6 comments

Comments

@k-yang
Copy link
Contributor

k-yang commented Jan 15, 2023

Summary of Bug

We performed a chain upgrade. The chain upgrade added a new module. We registered the UpgradeHandler but forgot to add the module to the StoreUpgrades (see NibiruChain/nibiru@b986c49).

When we started the new binary, it tried to Prune previous versions of the store, and we ended up with the error panic: cannot delete latest saved version (718). We knew something was wrong because we our upgrade height is 2460000. It means the store was loaded with an initialVersion less than the upgradeHeight.

Here is the full stack trace.
4:39PM INF executed block height=2460000 module=consensus num_invalid_txs=0 num_valid_txs=0
panic: cannot delete latest saved version (718)

goroutine 1 [running]:
github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).PruneStores(0xc00116ed00, 0x1, {0x0?, 0x0?, 0xc010d7c700?})
        /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/store/rootmulti/store.go:460 +0x285
github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).Commit(0xc00116ed00)
        /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/store/rootmulti/store.go:429 +0x1ab
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit(0xc0002c1880)
        /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/baseapp/abci.go:314 +0x1e9
github.com/tendermint/tendermint/abci/client.(*localClient).CommitSync(0xc00107db60)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/abci/client/local_client.go:264 +0xb6
github.com/tendermint/tendermint/proxy.(*appConnConsensus).CommitSync(0xc0028d46b0?)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/proxy/app_conn.go:93 +0x22
github.com/tendermint/tendermint/state.(*BlockExecutor).Commit(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258974, {{0xc006ca2000, ...}, ...}, ...}, ...)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/state/execution.go:228 +0x269
github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258974, {{0xc006ca2000, ...}, ...}, ...}, ...)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/state/execution.go:180 +0x6ee
github.com/tendermint/tendermint/consensus.(*Handshaker).replayBlock(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258973, {{0xc005692da0, ...}, ...}, ...}, ...)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/consensus/replay.go:503 +0x23c
github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258973, {{0xc005692da0, ...}, ...}, ...}, ...)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/consensus/replay.go:416 +0x7ae
github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc0028d5d90, {0x2f95740, 0xc00055dd40})
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/consensus/replay.go:268 +0x3d4
github.com/tendermint/tendermint/node.doHandshake({_, _}, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258973, ...}, ...)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/node/node.go:330 +0x1b8
github.com/tendermint/tendermint/node.NewNode(0xc0010e6640, {0x2f7e780, 0xc00125bcc0}, 0xc001289000, {0x2f63120, 0xc010676648}, 0x0?, 0x0?, 0xc001289250, {0x2f83970, ...}, ...)
        /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/node/node.go:778 +0x597
github.com/cosmos/cosmos-sdk/server.startInProcess(_, {{0x0, 0x0, 0x0}, {0x2fa1bb8, 0xc00117a3f0}, {0xc000303b30, 0xf}, {0x2f89120, 0xc0010bf810}, ...}, ...)
        /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/server/start.go:280 +0x7db
github.com/cosmos/cosmos-sdk/server.StartCmd.func2(0xc001115b00?, {0x41aaa68?, 0x0?, 0x0?})
        /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/server/start.go:128 +0x169
github.com/spf13/cobra.(*Command).execute(0xc001115b00, {0x41aaa68, 0x0, 0x0})
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0xc000f76600)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:961
github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x0?, {0xc0010c6258, 0x12})
        /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/server/cmd/execute.go:36 +0x1eb
main.main()
        /home/runner/work/nibiru/nibiru/cmd/nibid/main.go:16 +0x2c

We read through cosmos/ibc-go#1088 and #8265 and added the module to StoreUpgrades (see NibiruChain/nibiru@0c1bd2e). We ran this binary and obtained the following error message:

failed to load latest version: failed to load store: initial version set to 2460000, but found earlier version 1

which indicates that the new store we're trying to add was already persisted disk by the earlier binary run.

We can continue our chain by setting pruning = "nothing" but this is undesirable because it effectively turns our node into a full archive node.

We are at an impasse because the earlier binary run persisted the store to disk with initialVersion=0, preventing us from pruning from height=2460000, and we are no longer able to add the module to StoreUpgrades.

In my opinion, the Cosmos SDK shouldn't persist stores to disk unless the StoreUpgrades array is explicitly given.

Version

We use v0.45.10 of the Cosmos SDK.

Steps to Reproduce

# old binary
nibid start

nibid tx gov submit-proposal software-upgrade "v0.17.0" \
--title "v0.17.0" \
--description "Upgrade to v0.17.0" \
--upgrade-height 2460000 \
--upgrade-info '' \
--from val \
--output json \
--yes \
--gas-prices 0.025unibi \
--deposit 10000000unibi

nibid tx gov vote 1 yes \
--from val \
--output json \
--yes \
--gas-prices 0.025unibi

# wait for chain halt and download new binary
nibid start
@tac0turtle
Copy link
Member

this is usually done by an incorrect upgrade of adding a store where the store version is not the same as the others. We should really fix that

@yihuang
Copy link
Collaborator

yihuang commented Jan 15, 2023

I feel you, we probably shouldn't separate the stores into multiple trees in the first place.

@k-yang
Copy link
Contributor Author

k-yang commented Jan 15, 2023

this is usually done by an incorrect upgrade of adding a store where the store version is not the same as the others. We should really fix that

I think if we follow the invariant that store versions are always equal to the block height, then adding a store could set the initialVersion = blockHeight automatically and we wouldn't need the following check in loadVersion.

if upgrades.IsAdded(key.Name()) {
  storeParams.initialVersion = uint64(ver) + 1
}

That could simplify the StoreUpgrades for added modules.

@julienrbrt
Copy link
Member

julienrbrt commented Jan 15, 2023

related: #13933 / #14410

@alexanderbez
Copy link
Contributor

alexanderbez commented Jan 16, 2023

I feel you, we probably shouldn't separate the stores into multiple trees in the first place.

Yes! I've been advocating for a single logical DB/tree for ages now. It also gives us atomicity. However, this is technically orthogonal to the issue described here. We will be discussing this in the storage WG though!


@tac0turtle we should combine #13933 and this issue into a single issue (I have no preference on which we close or even just create a new one).

As pointed out by @k-yang and others, I believe the simple solution here to have the root MS ensure that whenever a new module store is discovered, it is loaded/saved with the current block height and not 0.

I think (?) #14410 is going to do this but I'm not sure.

cc @catShaark

@catShaark
Copy link
Contributor

catShaark commented Jan 18, 2023

@alexanderbez, so my PR will add validation in the loadVersion so that the version of every module stores matches version of root store. This validation will prevent chain from getting this uneven store heights problem, not actually solving it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants