Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

priv_validator_state.json rename when the chain is running results in panic #416

Closed
rach-id opened this issue May 17, 2022 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@rach-id
Copy link
Member

rach-id commented May 17, 2022

Problem statement

As of the discussion under #415 (comment), Celestia-app panics if the priv_validator_state.json file is mounted.

How to reproduce

Create a docker image of the Celestia-app binary. Then, instead of creating the priv_validator_state.json file inside the image, try mounting it with rw permissions. The chain will halt at block 1 with the following logs:

ERR CONSENSUS FAILURE!!! err="rename /opt/data/write-file-atomic-08895712196458512116 /opt/data/priv_validator_state.json: device or resource busy" module=consensus s
tack="goroutine 216 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngit.luolix.top/tendermint/tendermint/consensus.(*State).receiveRoutine.func2()\n\t/go/
pkg/mod/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:726 +0x4c\npanic({0x1978200, 0xc0022989c0})\n\t/usr/local/go/src/runtime/panic.go:1038 +0x215\ngit.luolix.top/t
endermint/tendermint/privval.(*FilePVLastSignState).Save(0x78)\n\t/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/privval/file.go:139 +0x90\ngit.luolix.top/tendermint/tenderm
int/privval.(*FilePV).saveSigned(0xc0000fc600, 0xc0027bb700, 0x4106b4, 0x0, {0xc0027c0280, 0x80, 0x203000}, {0xc0022ac700, 0x40, 0x40})\n\t/go/pkg/mod/github.com/celestiaorg/celestia-core@v1
.0.1-tm-v0.34.16/privval/file.go:395 +0x88\ngit.luolix.top/tendermint/tendermint/privval.(*FilePV).signProposal(0xc008daec80, {0xc00906ded7, 0x7}, 0xc0000fc680)\n\t/go/pkg/mod/github.com/celesti
aorg/celestia-core@v1.0.1-tm-v0.34.16/privval/file.go:381 +0x2ba\ngit.luolix.top/tendermint/tendermint/privval.(*FilePV).SignProposal(0xc0000fc600, {0xc00906ded7, 0x0}, 0x0)\n\t/go/pkg/mod/githu
b.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/privval/file.go:265 +0x1e\ngit.luolix.top/tendermint/tendermint/consensus.(*State).defaultDecideProposal(0xc009272700, 0x1, 0x0)\n\t/go/pkg/mod
/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:1132 +0x2e2\ngit.luolix.top/tendermint/tendermint/consensus.(*State).enterPropose(0xc009272700, 0x1, 0x0)\n\t/go/pkg/m
od/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:1096 +0x6a3\ngit.luolix.top/tendermint/tendermint/consensus.(*State).enterNewRound(0xc009272700, 0x1, 0x0)\n\t/go/pk
g/mod/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:1019 +0xa67\ngit.luolix.top/tendermint/tendermint/consensus.(*State).handleTimeout(0xc009272700, {0xc000d923f0, 0
xc000d923f0, 0xd923f0, 0xc0}, {0x1, 0x0, 0x1, {0x181e0134, 0xeda15924b, ...}, ...})\n\t/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:892 +0x4a5\ngith
ub.com/tendermint/tendermint/consensus.(*State).receiveRoutine(0xc009272700, 0x0)\n\t/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:791 +0x6bf\ncreate
d by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\t/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.0.1-tm-v0.34.16/consensus/state.go:378 +0x12f\n"

Possible tracks to solve

  • Investigate whether that file really needs to be renamed (as stated in the logs) or it is just a bug on tendermint side.
  • Check the other tendermint chains when they create the Dockerfile whether the same problem happens (I checked osmosis and their Dockerfile is straightforward. Either, they don't have this issue or they're not aware of it as nobody is using those images).
@rach-id rach-id added the bug Something isn't working label May 17, 2022
@rach-id rach-id changed the title priv_validator_state.json rename when the chain is running priv_validator_state.json rename when the chain is running results in panic May 17, 2022
@evan-forbes
Copy link
Member

evan-forbes commented May 17, 2022

What version of celestia-core? this might be a tendermint v0.35.4 thing.

@rach-id
Copy link
Member Author

rach-id commented May 17, 2022

It's reproducible with the current master. If you're interested, I can share with you a project reproducing this.

@evan-forbes
Copy link
Member

evan-forbes commented Oct 3, 2022

wait, after rereading this, I think this might be on purpose. Originally I thought this was just the priv_validator_key.json. I think panicking if attempting to change that file while running makes complete sense to avoid double signing

@rach-id
Copy link
Member Author

rach-id commented Oct 3, 2022

probably it's because of the mounted permissions. I don't think this is relevant anymore. Can be closed I guess

@evan-forbes
Copy link
Member

closing for now, as I'm not sure there's anything we can do to safely remove this protection. We might be stuck getting around it in the tests a different way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants