Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: Backport snapshot recovery from #7917 to 3.1 branch #9808

Merged
merged 1 commit into from
Jun 5, 2018

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Jun 5, 2018

On 3.1.11, we encountered:

2018-06-01 00:15:41.134882 C | etcdmain: database file (/tmp/default.etcd/member/snap/db index 263183180) does not match with snapshot (index 265092933).

This has proved tricky to reproduce exactly, but we were able to simulate it by setting --max-wals=1 stopping a member, waiting for the wal logs to be replaced, forcing compaction and then replacing the db file with one of the out-of-date snap.db files.

We believe it is caused by recovery from a snap.db file when etcd crashes after persisting the snapshot file to disk but before updating the backend db. recoverBackendSnapshot was introduced in #7917 to fix this in etcd 3.2+, so this PR backports that recovery code to 3.1.

./test SUCCESS
PASSES=integration ./test SUCCESS

Note that #7856, fixed a bug with this same error message, but that fix exists already in 3.1.11.

cc @gyuho @wojtek-t @wenjiaswe

Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks!

Let's merge in test greens.

@jpbetz
Copy link
Contributor Author

jpbetz commented Jun 5, 2018

Sounds good. I've verified that the fix works the same as it does on the 3.2 branch for cases where there is a snap.db file and the main db file is stale, so I'm comfortable with this being merged.

@jpbetz jpbetz merged commit ebe351e into etcd-io:release-3.1 Jun 5, 2018
@wojtek-t
Copy link
Contributor

wojtek-t commented Jun 6, 2018

@jpbetz - thanks for preparing the fix so fast.

@mborsz - FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants