Faster corruption checksum generation #10893

jpbetz · 2019-07-15T23:24:20Z

In etcd 3.4, a corruption checker will be enabled by default on startup, and will be available to run periodically using the --corrupt-check-time flag. This corruption checker will perform a full pass of all uncompacted revisions, computing hashes for all recorded KeyValue records stored. Both the etcd member initiating the corruption check and all peers must compute the checksum so they can be compared. This can be expensive.

We should consider instead computing CRCs for each KeyValue when they are persisted, and writing these CRCs to disk with the KeyValue record. We could them to compute hashes of the entire db state from these CRCs, and separately, validate that a KeyValue's data matches the CRCs whenever reading it from file (e.g. indexing, data serving).

This would result in both faster consistency checking, and improved data corruption checking.

It would be a good first step toward fully incremental hash generation, which could be something like:

Put each revision's KeyValue entry into a merkle tree where each node is identified by the revision. (need to account for subrevisions and tombstones here).
Each time a revision is written, update the merkle tree with a node for the revision.
Each time a compaction occurs remove compacted nodes from the merkle tree (this might be a problem, suggestions welcome on how we might make this faster/cheaper).

Regardless, even before introducing a merkle tree, the faster checksum generation that could be used with the existing corruption checker, and would also work well for Chubby checksum checking (See section 6.2 of Chubby: Paxos Made Live - An Engineering Perspective) which I prefer because the members all compute checksums based on a raft log entry, so would have their keyspaces compacted to the same point, which avoids having to skip checks because of compaction index mismatches.

cc @jingyih @wenjiaswe @lavalamp @gyuho

The text was updated successfully, but these errors were encountered:

gyuho · 2019-08-05T06:43:11Z

@jpbetz The corruption checker has been delayed for 3.5, because we haven't figured out how to deal with TLS-secured peers. Let's milestone this for 3.5.

jpbetz · 2019-08-05T17:57:28Z

@jpbetz The corruption checker has been delayed for 3.5, because we haven't figured out how to deal with TLS-secured peers. Let's milestone this for 3.5.

Sounds good.

stale · 2020-04-06T22:51:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

gyuho added this to the etcd-v3.5 milestone Aug 5, 2019

gyuho added the area/performance label Aug 5, 2019

tangcong mentioned this issue Mar 3, 2020

a data corruption bug in all etcd3 version when authentication is enabled #11651

Closed

stale bot added the stale label Apr 6, 2020

stale bot closed this as completed Apr 27, 2020

serathius mentioned this issue Mar 10, 2022

Introduce feature tracking #13775

Closed

lavalamp mentioned this issue Mar 25, 2022

Proposals should include a merkle root #13839

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster corruption checksum generation #10893

Faster corruption checksum generation #10893

jpbetz commented Jul 15, 2019

gyuho commented Aug 5, 2019

jpbetz commented Aug 5, 2019

stale bot commented Apr 6, 2020

Faster corruption checksum generation #10893

Faster corruption checksum generation #10893

Comments

jpbetz commented Jul 15, 2019

gyuho commented Aug 5, 2019

jpbetz commented Aug 5, 2019

stale bot commented Apr 6, 2020