Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster corruption checksum generation #10893

Closed
jpbetz opened this issue Jul 15, 2019 · 3 comments
Closed

Faster corruption checksum generation #10893

jpbetz opened this issue Jul 15, 2019 · 3 comments

Comments

@jpbetz
Copy link
Contributor

jpbetz commented Jul 15, 2019

In etcd 3.4, a corruption checker will be enabled by default on startup, and will be available to run periodically using the --corrupt-check-time flag. This corruption checker will perform a full pass of all uncompacted revisions, computing hashes for all recorded KeyValue records stored. Both the etcd member initiating the corruption check and all peers must compute the checksum so they can be compared. This can be expensive.

We should consider instead computing CRCs for each KeyValue when they are persisted, and writing these CRCs to disk with the KeyValue record. We could them to compute hashes of the entire db state from these CRCs, and separately, validate that a KeyValue's data matches the CRCs whenever reading it from file (e.g. indexing, data serving).

This would result in both faster consistency checking, and improved data corruption checking.

It would be a good first step toward fully incremental hash generation, which could be something like:

  1. Put each revision's KeyValue entry into a merkle tree where each node is identified by the revision. (need to account for subrevisions and tombstones here).
  2. Each time a revision is written, update the merkle tree with a node for the revision.
  3. Each time a compaction occurs remove compacted nodes from the merkle tree (this might be a problem, suggestions welcome on how we might make this faster/cheaper).

Regardless, even before introducing a merkle tree, the faster checksum generation that could be used with the existing corruption checker, and would also work well for Chubby checksum checking (See section 6.2 of Chubby: Paxos Made Live - An Engineering Perspective) which I prefer because the members all compute checksums based on a raft log entry, so would have their keyspaces compacted to the same point, which avoids having to skip checks because of compaction index mismatches.

cc @jingyih @wenjiaswe @lavalamp @gyuho

@gyuho
Copy link
Contributor

gyuho commented Aug 5, 2019

@jpbetz The corruption checker has been delayed for 3.5, because we haven't figured out how to deal with TLS-secured peers. Let's milestone this for 3.5.

@gyuho gyuho added this to the etcd-v3.5 milestone Aug 5, 2019
@jpbetz
Copy link
Contributor Author

jpbetz commented Aug 5, 2019

@jpbetz The corruption checker has been delayed for 3.5, because we haven't figured out how to deal with TLS-secured peers. Let's milestone this for 3.5.

Sounds good.

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants