Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pruning=everything causes db corruption #10352

Closed
4 tasks
Tracked by #14
ValarDragon opened this issue Oct 13, 2021 · 6 comments · Fixed by #11177
Closed
4 tasks
Tracked by #14

pruning=everything causes db corruption #10352

ValarDragon opened this issue Oct 13, 2021 · 6 comments · Fixed by #11177

Comments

@ValarDragon
Copy link
Contributor

Summary of Bug

its been repeatedly reported that pruning=everything will cause db corruption for nodes across different cosmos chains. The db corruption looks like failed to load latest version: failed to load store: wanted to load target 1488419 but only found up to 0. Presumably this is coming from restarting a node, but not even keeping the latest state due to {some} issue.

Version

All versions on the v0.42.X line. I am unsure about v0.44.x chains, as I don't actively work on any v0.44.x release chains atm.

Steps to Reproduce

Run a node with pruning=everything, occasionally stop and restart it, and this will happen.

Suggested fix

Perhaps pruning=everything should be equivalent to keep-recent=1 or keep-recent=2? Or is there another solution here to make it safer / less prone to db corruptions? FWIW, on a chainlayer pruned snapshot that got corrupted, the right size of data for state was still there. I did not inspect how different the saved data was to other nodes. (in part b/c idk how, I don't know of a convenient framework to decode leveldb entries)


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@alexanderbez
Copy link
Contributor

Yeah I've also numerous reports of this...it smells like an off-by-one type of situation if I had to guess (as I've implemented most of this logic). I would say this is pretty high priority.

@AmauryM any ideas who has bandwidth to look into this?

@amaury1093
Copy link
Contributor

@alexanderbez do you think you have bandwidth to tackle this? If not we can maybe find someone on the regen team

@alexanderbez
Copy link
Contributor

Yeah you can assign it to me, but I don't know when I'll be able to get to it. It might take me a few weeks.

@alexanderbez alexanderbez self-assigned this Oct 28, 2021
@alexanderbez
Copy link
Contributor

In the meantime, I would recommend a custom setting where you only keep a handful, say 100 blocks and prune every 100 blocks.

@alexanderbez
Copy link
Contributor

Proposal: Have prune=everything actually keep that last two blocks always as a buffer. A somewhat lazy approach, but I believe this should do the trick rather than spend countless hours debugging where the "off by one" error might be.

Thoughts @ValarDragon ?

@ValarDragon
Copy link
Contributor Author

ValarDragon commented Feb 11, 2022

100% agreed, and we just file an issue for long term figuring out what the actual problem was

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants