Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crashing regularly caused DB to stop updating #423

Open
gamarin2 opened this issue Feb 7, 2022 · 2 comments
Open

Node crashing regularly caused DB to stop updating #423

gamarin2 opened this issue Feb 7, 2022 · 2 comments
Assignees

Comments

@gamarin2
Copy link
Contributor

gamarin2 commented Feb 7, 2022

Our JUNO node kept crashing because of a SDK bug. cosmos/cosmos-sdk#11117

In spite of that, the node displayed as SYNCED in our cluster (and apparently was). Yet, our DB stopped wasn't updating staking balance of this account for two months (missed several writes over several weeks).

Let's try to understand what happened and figure out preventing measure, if any.

@sgerogia
Copy link
Contributor

Maybe related tangentially, maybe part of the same issue.

Currently happening in Osmosis (Sat 12 Feb 20:43)

  • There was an error failed to write trace operation: write /trace-store/kvstore.fifo: broken pipe in the logs. This came from only 1 node at 17:26
    image.png
  • At around same time (~10 mins later), we had the last signs of life from an Osmosis TL, a bunch of parsing error messages. This was from the same pod with the broken pipe error osmosis-1
    image.png
  • Checking in the K8s cluster, all Osmosis TLs are unresponsive atm
  • I speculate that osmosis-1 was the last one with a surviving TL.
  • Looking online for reasons of broken pipe messages, there are 2 suggestions
    • downstream process not being able to keep up, or
    • machine reaching file system limits

@sgerogia
Copy link
Contributor

Full error

�[90m5:26PM�[0m �[1m�[31mERR�[0m�[0m CONSENSUS FAILURE!!! �[36merr=�[0m"failed to write trace operation: write /trace-store/kvstore.fifo: broken pipe" �[36mmodule=�[0mconsensus �[36mstack=�[0m"goroutine 51915 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngit.luolix.top/tendermint/tendermint/consensus.(*State).receiveRoutine.func2()\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:726 +0x4c\npanic({0x1aa4d80, 0xc2154a9660})\n\t/usr/local/go/src/runtime/panic.go:1038 +0x215\ngit.luolix.top/cosmos/cosmos-sdk/store/tracekv.writeOperation({0x2577a40, 0xc00000e160}, {0x1c373ab, 0x4}, 0xc1b7df9950, {0xc2627d3f20, 0x20, 0xc39412f260}, {0xc28e81a0a0, 0x10, ...})\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/store/tracekv/store.go:200 +0x1d3\ngit.luolix.top/cosmos/cosmos-sdk/store/tracekv.(*Store).Get(0xc3807b9e00, {0xc2627d3f20, 0x1b, 0x1b})\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/store/tracekv/store.go:56 +0xab\ngit.luolix.top/cosmos/cosmos-sdk/store/cachekv.(*Store).Get(0xc1ac76a300, {0xc2627d3f20, 0x1b, 0x1b})\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/store/cachekv/store.go:66 +0x17e\ngit.luolix.top/cosmos/cosmos-sdk/store/gaskv.(*Store).Get(0xc2f3769e60, {0xc2627d3f20, 0x1b, 0x1b})\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/store/gaskv/store.go:39 +0x71\ngit.luolix.top/cosmos/cosmos-sdk/store/prefix.Store.Get({{0x25e05d8, 0xc2f3769e60}, {0xc26894faa0, 0x16, 0x18}}, {0xc409370e48, 0x5, 0x9})\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/store/prefix/store.go:68 +0x10c\ngit.luolix.top/cosmos/cosmos-sdk/x/bank/keeper.BaseViewKeeper.GetBalance({{_, _}, {_, _}, {_, _}}, {{0x25bbf08, 0xc00007e020}, {0x25e8ca0, 0xc1ac76a400}, ...}, ...)\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/x/bank/keeper/view.go:102 +0x179\ngit.luolix.top/cosmos/cosmos-sdk/x/bank/keeper.BaseSendKeeper.addCoins({{{0x25e3af8, 0xc003d24c00}, {0x2584890, 0xc003d5cc80}, {0x25e4b30, 0xc001324a20}}, {0x25e3af8, 0xc003d24c00}, {0x25e4b30, 0xc001324a20}, ...}, ...)\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/x/bank/keeper/send.go:263 +0x356\ngit.luolix.top/cosmos/cosmos-sdk/x/bank/keeper.BaseSendKeeper.SendManyCoins({{{0x25e3af8, 0xc003d24c00}, {0x2584890, 0xc003d5cc80}, {0x25e4b30, 0xc001324a20}}, {0x25e3af8, 0xc003d24c00}, {0x25e4b30, 0xc001324a20}, ...}, ...)\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/x/bank/keeper/send.go:194 +0x4c5\ngit.luolix.top/cosmos/cosmos-sdk/x/bank/keeper.BaseKeeper.SendCoinsFromModuleToManyAccounts({{{{0x25e3af8, 0xc003d24c00}, {0x2584890, 0xc003d5cc80}, {0x25e4b30, 0xc001324a20}}, {0x25e3af8, 0xc003d24c00}, {0x25e4b30, 0xc001324a20}, ...}, ...}, ...)\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/x/bank/keeper/keeper.go:317 +0x185\ngit.luolix.top/osmosis-labs/osmosis/x/incentives/keeper.Keeper.doDistributionSends({{0x25ee658, 0xc003d24c00}, {0x2584890, 0xc003d5cd70}, {{0x25e3af8, 0xc003d24c00}, 0xc00000e2d8, {0x2584890, 0xc003d5cce0}, {0x25848e0, ...}, ...}, ...}, ...)\n\t/src/app/x/incentives/keeper/gauge.go:345 +0x193\ngit.luolix.top/osmosis-labs/osmosis/x/incentives/keeper.Keeper.Distribute({{0x25ee658, 0xc003d24c00}, {0x2584890, 0xc003d5cd70}, {{0x25e3af8, 0xc003d24c00}, 0xc00000e2d8, {0x2584890, 0xc003d5cce0}, {0x25848e0, ...}, ...}, ...}, ...)\n\t/src/app/x/incentives/keeper/gauge.go:443 +0x205\ngit.luolix.top/osmosis-labs/osmosis/x/incentives/keeper.Keeper.AfterEpochEnd({{0x25ee658, 0xc003d24c00}, {0x2584890, 0xc003d5cd70}, {{0x25e3af8, 0xc003d24c00}, 0xc00000e2d8, {0x2584890, 0xc003d5cce0}, {0x25848e0, ...}, ...}, ...}, ...)\n\t/src/app/x/incentives/keeper/hooks.go:27 +0x418\ngit.luolix.top/osmosis-labs/osmosis/x/incentives/keeper.Hooks.AfterEpochEnd(...)\n\t/src/app/x/incentives/keeper/hooks.go:62\ngit.luolix.top/osmosis-labs/osmosis/x/epochs/types.MultiEpochHooks.AfterEpochEnd({_, _, _}, {{0x25bbf08, 0xc00007e020}, {0x25e8ca0, 0xc1ac76a400}, {{0xb, 0x1}, {0xc27255e210, ...}, ...}, ...}, ...)\n\t/src/app/x/epochs/types/hooks.go:26 +0xc5\ngit.luolix.top/osmosis-labs/osmosis/x/epochs/keeper.Keeper.AfterEpochEnd(...)\n\t/src/app/x/epochs/keeper/hooks.go:8\ngit.luolix.top/osmosis-labs/osmosis/x/epochs.BeginBlocker.func1(0xc39412f1a0, {{0xc29d84b271, 0x3}, {0x0, 0xed85ec810, 0x0}, 0x4e94914f0000, 0xef, {0x3588d964, 0xed999e259, ...}, ...})\n\t/src/app/x/epochs/abci.go:43 +0x719\ngit.luolix.top/osmosis-labs/osmosis/x/epochs/keeper.Keeper.IterateEpochInfo({{_, _}, {_, _}, {_, _}}, {{0x25bbf08, 0xc00007e020}, {0x25e8ca0, 0xc1ac76a400}, ...}, ...)\n\t/src/app/x/epochs/keeper/epoch.go:55 +0x1fd\ngit.luolix.top/osmosis-labs/osmosis/x/epochs.BeginBlocker({{0x25bbf08, 0xc00007e020}, {0x25e8ca0, 0xc1ac76a400}, {{0xb, 0x1}, {0xc27255e210, 0x9}, 0x309bbb, {0x25cc8018, ...}, ...}, ...}, ...)\n\t/src/app/x/epochs/abci.go:16 +0x1fe\ngit.luolix.top/osmosis-labs/osmosis/x/epochs.AppModule.BeginBlock(...)\n\t/src/app/x/epochs/module.go:164\ngit.luolix.top/cosmos/cosmos-sdk/types/module.(*Manager).BeginBlock(_, {{0x25bbf08, 0xc00007e020}, {0x25e8ca0, 0xc1ac76a400}, {{0xb, 0x1}, {0xc27255e210, 0x9}, 0x309bbb, ...}, ...}, ...)\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/types/module/module.go:457 +0x218\ngit.luolix.top/osmosis-labs/osmosis/app.(*OsmosisApp).BeginBlocker(_, {{0x25bbf08, 0xc00007e020}, {0x25e8ca0, 0xc1ac76a400}, {{0xb, 0x1}, {0xc27255e210, 0x9}, 0x309bbb, ...}, ...}, ...)\n\t/src/app/app/app.go:736 +0xdb\ngit.luolix.top/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(_, {{0xc303a80900, 0x20, 0x20}, {{0xb, 0x1}, {0xc27255e210, 0x9}, 0x309bbb, {0x25cc8018, ...}, ...}, ...})\n\t/go/pkg/mod/github.com/osmosis-labs/cosmos-sdk@v0.43.0-rc3.0.20211209072213-711e78b4f6b4/baseapp/abci.go:193 +0x8fc\ngit.luolix.top/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync(_, {{0xc303a80900, 0x20, 0x20}, {{0xb, 0x1}, {0xc27255e210, 0x9}, 0x309bbb, {0x25cc8018, ...}, ...}, ...})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/abci/client/local_client.go:280 +0x118\ngit.luolix.top/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync(_, {{0xc303a80900, 0x20, 0x20}, {{0xb, 0x1}, {0xc27255e210, 0x9}, 0x309bbb, {0x25cc8018, ...}, ...}, ...})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/proxy/app_conn.go:81 +0x55\ngit.luolix.top/tendermint/tendermint/state.execBlockOnProxyApp({0x25bcc28, 0xc2c44ce240}, {0x25d2370, 0xc015914940}, 0xc206d5a1e0, {0x25e17b8, 0xc005109f50}, 0x309bba)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/state/execution.go:307 +0x3dd\ngit.luolix.top/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x1}, {0xc28d515bd9, 0x7}}, {0xc28d515bf0, 0x9}, 0x1, 0x309bba, {{0xc222dc3140, ...}, ...}, ...}, ...)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/state/execution.go:140 +0x171\ngit.luolix.top/tendermint/tendermint/consensus.(*State).finalizeCommit(0xc0010d0380, 0x309bbb)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:1635 +0x9fd\ngit.luolix.top/tendermint/tendermint/consensus.(*State).tryFinalizeCommit(0xc0010d0380, 0x309bbb)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:1546 +0x305\ngit.luolix.top/tendermint/tendermint/consensus.(*State).enterCommit.func1()\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:1481 +0x87\ngit.luolix.top/tendermint/tendermint/consensus.(*State).enterCommit(0xc0010d0380, 0x309bbb, 0x0)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:1519 +0xc06\ngit.luolix.top/tendermint/tendermint/consensus.(*State).addVote(0xc0010d0380, 0xc3acd2d360, {0xc2339c0480, 0x28})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:2132 +0xb6e\ngit.luolix.top/tendermint/tendermint/consensus.(*State).tryAddVote(0xc0010d0380, 0xc3acd2d360, {0xc2339c0480, 0xc1aca14900})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:1930 +0x2c\ngit.luolix.top/tendermint/tendermint/consensus.(*State).handleMsg(0xc0010d0380, {{0x2575f60, 0xc1ac199298}, {0xc2339c0480, 0x0}})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:838 +0x40b\ngit.luolix.top/tendermint/tendermint/consensus.(*State).receiveRoutine(0xc0010d0380, 0x0)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:762 +0x419\ncreated by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.14/consensus/state.go:378 +0x12f\n"

@gsora gsora removed their assignment Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants