Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3: snapshot status does not check database integrity #10108

Closed
jingyih opened this issue Sep 20, 2018 · 7 comments · Fixed by #10109
Closed

clientv3: snapshot status does not check database integrity #10108

jingyih opened this issue Sep 20, 2018 · 7 comments · Fixed by #10109

Comments

@jingyih
Copy link
Contributor

jingyih commented Sep 20, 2018

Version
etcd Version: 3.3.0+git
Git SHA: f32bc50
Go Version: go1.10.3
Go OS/Arch: linux/amd64

Issue
We have a corrupted snapshot file, identifiable by running bbolt check tool. When users do etcdctl snapshot status, it should notify user if the snapshot file is corrupted.
I will send out a PR to add database integrity verification in snapshot status.

Example output

etcdctl --write-out=table snapshot status corrupt_db/backup.db
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 1d0551d2 | 45565687 |        709 |     8.4 MB |
+----------+----------+------------+------------+
bbolt check corrupt_db/backup.db 
page 961: already freed
page 962: already freed
page 1347: already freed
page 1442: already freed
page 1760: already freed
page 2023: already freed
page 2033: already freed
page 969: unreachable unfreed
page 970: unreachable unfreed
page 1355: unreachable unfreed
page 1450: unreachable unfreed
page 1768: unreachable unfreed
page 2031: unreachable unfreed
page 2041: unreachable unfreed
14 errors found
invalid value
@jingyih
Copy link
Contributor Author

jingyih commented Sep 20, 2018

/cc @jpbetz @gyuho
Please let me know if you have any comments.

@jpbetz
Copy link
Contributor

jpbetz commented Sep 20, 2018

I’m strongly in favor of adding the bbolt check to snapshot status.

@xiang90
Copy link
Contributor

xiang90 commented Sep 20, 2018

@jpbetz @jingyih

A side question for you: is the snapshot file generated by etcd? Have you figured out the root cause of the corruption?

@hexfusion
Copy link
Contributor

@jingyih would it be possible to share this snapshot for the sake of experimentation? Is the underlying data in any way sensitive?

@jpbetz
Copy link
Contributor

jpbetz commented Sep 20, 2018

@hexfusion we can’t share the one we’ve got on hand, but I can try to produce one that can be shared publicly.

@hexfusion
Copy link
Contributor

@jpbetz, thanks I would appreciate that.

@jingyih
Copy link
Contributor Author

jingyih commented Sep 20, 2018

@xiang90 @hexfusion The corrupted snapshot file is due to freelist corruption. It was fixed and backported all the way to v3.1. The corrupted snapshot file was found on one of our GKE cluster where it is using v3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants