Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify corrupted member depending on quorum #14828

Merged
merged 7 commits into from
Nov 28, 2022

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Nov 23, 2022

Currently when the compact hash checker detects hash mismatch, it assumes that the corrupted member is always one of the followers. This isn't correct, because it's also possible that it's the leader's data corrupted. It's also possible that there are multiple members corrupted, for example 2 members out of a 5 member cluster.

The solution is to depend on quorum to identify the corrupted member. For example, for a 3 member cluster, if 2 members have the same compactRevision and hash, then the left one member is the corrupted one. For a 5 member cluster, if at least 3 members have the same CompactRevision and hash, then the left members are the corrupted ones.

If there isn't a quorum, then the least minority are regarded as the corrupted member. For example, for a 5 member cluster, m1 and m2 have the same CompactRevision and hash, m3 and 4 have the same CompactRevision and hash, the m5 is the corrupted member.

@codecov-commenter
Copy link

codecov-commenter commented Nov 23, 2022

Codecov Report

Merging #14828 (d545d60) into main (cdb9b8b) will decrease coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main   #14828      +/-   ##
==========================================
- Coverage   75.52%   75.45%   -0.07%     
==========================================
  Files         457      457              
  Lines       37423    37469      +46     
==========================================
+ Hits        28264    28274      +10     
- Misses       7386     7413      +27     
- Partials     1773     1782       +9     
Flag Coverage Δ
all 75.45% <100.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/etcdserver/corrupt.go 90.85% <100.00%> (+1.28%) ⬆️
server/etcdserver/api/etcdhttp/types/errors.go 60.00% <0.00%> (-13.34%) ⬇️
client/v3/namespace/watch.go 87.87% <0.00%> (-6.07%) ⬇️
server/storage/mvcc/watchable_store.go 84.42% <0.00%> (-5.08%) ⬇️
client/v3/concurrency/session.go 88.63% <0.00%> (-4.55%) ⬇️
server/proxy/grpcproxy/watch.go 92.48% <0.00%> (-4.05%) ⬇️
server/etcdserver/txn/util.go 75.47% <0.00%> (-3.78%) ⬇️
client/v3/leasing/txn.go 88.09% <0.00%> (-3.18%) ⬇️
client/v3/experimental/recipes/key.go 75.34% <0.00%> (-2.74%) ⬇️
server/etcdserver/api/v3rpc/interceptor.go 74.47% <0.00%> (-2.09%) ⬇️
... and 12 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ahrtr
Copy link
Member Author

ahrtr commented Nov 23, 2022

With the change in this PR, I reverted the change in #14824

@fuweid @ptabor @serathius @spzala

server/etcdserver/corrupt.go Outdated Show resolved Hide resolved
server/etcdserver/corrupt.go Outdated Show resolved Hide resolved
server/etcdserver/corrupt_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member Author

ahrtr commented Nov 23, 2022

Thanks @fuweid for the comments, all look good to me.

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM(non-binding)

@serathius
Copy link
Member

Note that this is not something that was introduced for compact hash verification, the issue was already present in PeriodicCheck.

My suggestion:

  • Let's send a fix that removes setting MemberID in alarm check for both PeriodicCheck and CompactHashCheck. This can be backported to v3.5 and v3.4
  • This PR that implements proper detection of which member is corrupted should be treated as a v3.6 feature.

What do you think?

@ahrtr
Copy link
Member Author

ahrtr commented Nov 23, 2022

Note that this is not something that was introduced for compact hash verification, the issue was already present in PeriodicCheck.

Yes, I was aware of it that PeriodicCheck has similar "issue". The reason why I did not update PeriodicCheck is that it has a little different & incorrect logic from the CompactCheck, please see my comment #14536 (comment). So let's do this step by step. We can update & enhance PeriodicCheck in a separate PR.

  • Let's send a fix that removes setting MemberID in alarm check for both PeriodicCheck and CompactHashCheck. This can be backported to v3.5 and v3.4

Seems like a good suggestion, but the only concern is that it may break the existing user experience. @ptabor @spzala what's your thought?

  • This PR that implements proper detection of which member is corrupted should be treated as a v3.6 feature.

Agreed. Let's do similar change for PeriodicCheck in a separate PR.

@serathius
Copy link
Member

Seems like a good suggestion, but the only concern is that it may break the existing user experience. @ptabor @spzala what's your thought?

This was broken for long time until we fixed only recently #14272. A bug turned out to be a feature :P. I don't see an issue with backporting this.

Corruption checks previously set MemberID value based on member id field in response for Hash request. When I investigated hash verification I found that member never sets memberid in the Hash response.

@ahrtr
Copy link
Member Author

ahrtr commented Nov 24, 2022

This was broken for long time until we fixed only recently #14272. A bug turned out to be a feature :P. I don't see an issue with backporting this.

OK. Let's discuss this in a separate session. Just raised a ticket to track it. #14849

server/etcdserver/corrupt.go Show resolved Hide resolved
@@ -258,57 +259,152 @@ func (cm *corruptionChecker) CompactHashCheck() {
)
hashes := cm.uncheckedRevisions()
// Assume that revisions are ordered from largest to smallest
for i, hash := range hashes {
for _, hash := range hashes {
peers := cm.hasher.PeerHashByRev(hash.Revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name looks like local 'lookup', while it's actually a blocking remote serialized calls to multiple endpoints. How about RequestHashFromPeerByRav or CallPeerAndGetHash()'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements for the backlog:

  1. getPeerHashKVs() could fetch the hashes in paralle.
  2. ServeHTTP could populate list of hashes we have (for consumption in v3.7, 2028)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For v3.6 release we should consider hash being negotiated via raft.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss & address this in a separate session/PR. Renaming and fetching the hashes in parallel make sense to me. @ptabor where & how is the 2028 coming from? :)

For @serathius 's comment "consider hash being negotiated via raft", I did not get the point. My immediate feeling there is no need, because the compaction is already coordinated by raft.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider hash being negotiated via raft.

Is it going to persist the hash result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the 2028 coming from ?

Just extrapolation or release frequency

Rafting hashes ?

I imagine this could work that way that way:

  1. there is new RAFT message 'start-checksum' that triggers all members to compute checksum at the exact revision. So it's 'simpler' compaction. Compaction stays as doing this implicitly.
  2. Whenever member finishes the computation it sends to leader their result (pair: rev, hash).
  3. Leader broadcasts ? the received results through RAFT
  4. Every-member can react on discrepancy.

Benefit: Does not require custom service and best-effort attempts to check whether we have checks-in-sync.

But I would consider evaluating (on-line) merkle root ( #13839 ) sums design first and thinking what raft changes would be needed for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ptabor for the comment. It's a big topic, let's discuss it separately.

server/etcdserver/corrupt.go Outdated Show resolved Hide resolved
@ahrtr ahrtr force-pushed the identify_corrupted_member_20221123 branch 4 times, most recently from c2c0d0b to 17cd0e0 Compare November 26, 2022 10:13
When the leader detects data inconsistency by comparing hashes,
currently it assumes that the follower is the corrupted member.
It isn't correct, the leader might be the corrupted member as well.

We should depend on quorum to identify the corrupted member.
For example, for 3 member cluster, if 2 members have the same hash,
the the member with different hash is the corrupted one. For 5 member
cluster, if 3 members have the same same, the corrupted member is one
of the left two members; it's also possible that both the left members
are corrupted.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
The change did in etcd-io#14824 fixed
the test instead of the product code. It isn't correct. After we
fixed the product code in this PR, we can revert the change in
that PR.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
…orrupted member

If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
etcd-io#14849

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr ahrtr force-pushed the identify_corrupted_member_20221123 branch from 17cd0e0 to e606d22 Compare November 26, 2022 11:35
@serathius
Copy link
Member

Proposed to rename some tests to make it easier to identify if we are missing any tests, but feel free to skip suggestions if you don't agree with them. Would love to see consistent naming for those scenarios, but maybe in next PR.

@ahrtr ahrtr force-pushed the identify_corrupted_member_20221123 branch 3 times, most recently from 1ba6390 to 15326f0 Compare November 26, 2022 12:10
…heck

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr ahrtr force-pushed the identify_corrupted_member_20221123 branch from 15326f0 to d545d60 Compare November 26, 2022 12:13
@ahrtr
Copy link
Member Author

ahrtr commented Nov 26, 2022

Proposed to rename some tests to make it easier to identify if we are missing any tests, but feel free to skip suggestions if you don't agree with them. Would love to see consistent naming for those scenarios, but maybe in next PR.

Resolved all the comments. Renaming isn't a big deal. But you are the original author the unit test case, so I followed all your suggestion.

PTAL, thx.

@ahrtr ahrtr merged commit cf171fd into etcd-io:main Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants