Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashKV should compute the hash of all MVCC keys up to a given revision #18337

Open
4 tasks done
fuweid opened this issue Jul 16, 2024 · 6 comments
Open
4 tasks done

HashKV should compute the hash of all MVCC keys up to a given revision #18337

fuweid opened this issue Jul 16, 2024 · 6 comments
Labels

Comments

@fuweid
Copy link
Member

fuweid commented Jul 16, 2024

Bug report criteria

What happened?

Based on https://etcd.io/docs/v3.5/dev-guide/api_reference_v3/, HashKV api should compute the hash of all MVCC keys up to a given revision, even if the key/value is tombstone. However, HashKV could skip compacted revision.

func (h *kvHasher) WriteKeyValue(k, v []byte) {
kr := BytesToRev(k)
upper := Revision{Main: h.revision + 1}
if !upper.GreaterThan(kr) {
return
}
lower := Revision{Main: h.compactRevision + 1}
// skip revisions that are scheduled for deletion
// due to compacting; don't skip if there isn't one.
if lower.GreaterThan(kr) && len(h.keep) > 0 {
if _, ok := h.keep[kr]; !ok {
return
}
}
h.hash.Write(k)
h.hash.Write(v)
}

For the compacted revision, the lower.GreaterThan(kr) always return true.
Assume that the compacted revision is for key A. If the key A has revisions higher than compacted revision, the HashKV won't compute the hash on compacted revision.

Case 1: ETCD contains more than two keys

For example, there are two keys in store.

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
//    {{2, 0}, {3, 0}, {4, 0}(t)[3]}

// key: "foo2"
// modified: 2
// generations:
//    {{2, 1}[1]}

After compaction on Revision{Main: 5}, the store will be like

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}

// key: "foo2"
// modified: 2
// generations:
//    {{2, 1}[1]}

When we perform HashKV on Revision{Main: 7}, the hash should compute the following revisions:

  • Revision{Main: 2, Sub: 1}
  • Revision{Main: 5, Sub: 0}
  • Revision{Main: 6, Sub: 0}
  • Revision{Main: 7, Sub: 0}

However, HashKV skips Revision{Main: 5} because the compacted revision is {Main: 5} and kvindex.Keep returns two revisions {Main: 2, Sub: 1} and {Main: 7}, The Revision{Main: 5} isn't in result of kvindex.Keep.

Case 2: ETCD contains only one key

There is key foo.

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
//    {{2, 0}, {3, 0}, {4, 0}(t)[3]}

After compaction on Revision{Main: 5}, the store will be like

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}

When we perform HashKV on Revision{Main: 7}, the hash should compute the following revisions:

  • Revision{Main: 5, Sub: 0}
  • Revision{Main: 6, Sub: 0}
  • Revision{Main: 7, Sub: 0}

HashKV skips Revision{Main: 5} because the compacted revision is {Main: 5} and kvindex.Keep returns one revision {Main: 7}.

However, when we perform HashKV on Revision{Main: 8}, the kvindex.Keep will return empty keep because the Revision{Main: 8} is tombstone. So, the hash result will compute the following revisions, which is expectation.

  • Revision{Main: 5, Sub: 0}
  • Revision{Main: 6, Sub: 0}
  • Revision{Main: 7, Sub: 0}
  • Revision{Main: 8, Sub: 0}

What did you expect to happen?

HashKV should compute the hash of all MVCC keys up to a given revision

How can we reproduce it (as minimally and precisely as possible)?

fuweid@a6454f2

$ cd server/storage/mvcc
$ go test -v -run TestHashKVImpactedByKeep -count=1 ./

Anything else we need to know?

The HashKV was introduced by #8263. IMO, the keep, available revision map, was used to skip the revisions which were marked deleted in on-going compaction. If there is no compaction, the keep is always empty and then HashKV will compute all the MVCC keys up to the revision.

#8263 has bug and then #8333 introduced keyIndex.Keep interface to fix bug.
However, after #8333, the non-empty keep map forces HashKV to skip compacted revision.

Etcd version (please run commands below)

  • main
  • release-3.5
  • release-3.4

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@ishan16696
Copy link
Contributor

Value return by HashKV api call till x revision should be equal to the Hash of snapshot taken upto x revision (removing the appended hash). Is this correct @fuweid ?

IMO, this is not correct as HashKV api call calculates the hash of all MVCC key-values, whereas snapshot is snapshot of etcd db which also contains cluster information, hence the hash will not be same.

@fuweid
Copy link
Member Author

fuweid commented Jul 18, 2024

Hi @ishan16696

the Hash of snapshot taken upto x revision

I don't follow this. Do you mean Hash API? If so, Hash will compute all MVCC keys, confState and other buckets in bbolt.

// Hash computes the hash of whole backend keyspace,
// including key, lease, and other buckets in storage.
// This is designed for testing ONLY!
// Do not rely on this in production with ongoing transactions,
// since Hash operation does not hold MVCC locks.
// Use "HashKV" API instead for "key" bucket consistency checks.
rpc Hash(HashRequest) returns (HashResponse) {
option (google.api.http) = {
post: "/v3/maintenance/hash"
body: "*"
};
}

@ishan16696
Copy link
Contributor

ishan16696 commented Jul 18, 2024

Hi @fuweid ,
What I mean by snapshot upto x revision is:

  • Suppose etcd is at x revision then somebody takes a snapshot of etcd using snapshot api call.
    // Snapshot sends a snapshot of the entire backend from a member over a stream to a client.
    rpc Snapshot(SnapshotRequest) returns (stream SnapshotResponse) {
    option (google.api.http) = {
    post: "/v3/maintenance/snapshot"
    body: "*"
    };
    }

Than to validate the integrity of this snapshot, If I calculate the hash this snapshot taken via snapshot api call upto x revision(removing appended hash) , will it be same as value return by Hash API call ?

@ishan16696
Copy link
Contributor

Anyway, my doubt is different from this issue, I have now open a separate issue now: #18340

@fuweid
Copy link
Member Author

fuweid commented Jul 18, 2024

Than to validate the integrity of this snapshot, If I calculate the hash this snapshot taken via snapshot api call upto x revision(removing appended snapshot)

The Snapshot uses sha256sum to compute the whole database.

However, both Hash and HashKV are using crc32.New(crc32.MakeTable(crc32.Castagnoli)). They are different.

will it be same as value return by Hash API call ?

Unfortunately, it's not.

@ahrtr
Copy link
Member

ahrtr commented Jul 19, 2024

Compaction is tightly coupled with HashKV, we have to consider/discuss them together. We need to get consensus on #18274 (comment) firstly.

As mentioned in that comment, compatibility (ensure new version and old version generate the same hash value) is also a big concern. We need to leverage the cluster-level feature to resolve it.

Probably it would be better to turn that into a KEP "Consistent Compaction and HashKV". cc @ivanvc @jmhbnz @serathius @siyuanfoundation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants