-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: iavl stores end up with inconsistent versions after upgrade #13477
Comments
@yihuang are the repro steps just based on hypothesis, or are you certain that Commit is abrupted prior to metadata being flushed? |
The steps are hypothesis, will try to reproduce locally at Monday.
And the log shows the node is killed several times during the upgrade migration. |
Hypothesis makes sense. Unfortunately, there's no direct way to use the same DB batch object when committing all the IAVL stores and the metadata. In other words, it's not atomic. Another reason why I really dislike this multi-logical-db store approach. Note, the store refactor has an item for using a single logical store. |
@alexanderbez what do you think should be the best short term solution?
|
If we return an error, how will the module's store be used successfully? Will the chain operate correctly, specifically for the new module?
Seems like the most plausible solution 👍
This is the ideal solution, but unfortunately, not possible in the current design. #12986 should solve this by using a single logical SC store, but that is a future upcoming refactor. |
I'm not sure how to do this though, do we ignore the loaded roots, and force it to be a empty iavl store, will it work when commit? |
No, we just fix the version/height of the faulty module to be correct, in this case +1. This would happen in the root-MS. |
fixed by #13530 |
Summary of Bug
We found a app hash mismatch in our testnet which upgrade to sdk 0.46 recently, we traced to a inconsistent iavl tree version number in a store that is added in the upgrade.
The hypothesis for how this happens is like this, the migration is slow, so some nodes manually killed the node during the commit event, which will trigger following bug:
feeibc
store, set the initialVersion tolatestVersion+1
.feeibc
atinitialVersion
feeibc
don't exists, so callLoadVersion
with target version0
, which will load whatever latest version of the iavl tree, which islatestVersion+1
.feeibc
store is commited normally, with version increased tolatestVersion + 2
, while the other stores is atlatestVersion + 1
.Then as long as
feeibc
store is empty, the issue get unnoticed, until some day it's been written to and trigger the app hash mismatch.Version
Steps to Reproduce
The text was updated successfully, but these errors were encountered: