Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: split with uninitialized RHS can race with raft changes to Term and Vote #75918

Closed
sumeerbhola opened this issue Feb 3, 2022 · 1 comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Feb 3, 2022

This came up in the review for #75761 (review)

During a split, if the right replica has been removed and added back (as an uninitialized replica), we load the HardState so that we can clear all the RHS state (including the HardState) and write back the HardState we have read. This peculiar dance accommodates the fact that our clearing of key ranges is coarse (if we could spare clearing the HardState we wouldn't need to read and write it).

if rightRepl != nil {
// Assert that the rightRepl is not initialized. We're about to clear out
// the data of the RHS of the split; we cannot have already accepted a
// snapshot to initialize this newer RHS.
if rightRepl.IsInitialized() {
log.Fatalf(ctx, "unexpectedly found initialized newer RHS of split: %v", rightRepl.Desc())
}
var err error
hs, err = rightRepl.raftMu.stateLoader.LoadHardState(ctx, readWriter)
if err != nil {
log.Fatalf(ctx, "failed to load hard state for removed rhs: %v", err)
}
}
const rangeIDLocalOnly = false
const mustUseClearRange = false
if err := clearRangeData(&split.RightDesc, readWriter, readWriter, rangeIDLocalOnly, mustUseClearRange); err != nil {
log.Fatalf(ctx, "failed to clear range data for removed rhs: %v", err)
}
if rightRepl != nil {
if err := rightRepl.raftMu.stateLoader.SetHardState(ctx, readWriter, hs); err != nil {
log.Fatalf(ctx, "failed to set hard state with 0 commit index for removed rhs: %v", err)
}
}

We know HardState.Commit cannot advance since the RHS cannot apply a snapshot yet. But there is nothing preventing a concurrent change to HardState.{Term,Vote} that we would accidentally undo here.

Discussion:
[tbg] We hold rightRepl.raftMu (this is not totally clear from looking at this method, but look how we access rightRepl.raftMu.stateLoader above, consider adding a comment; I think the lock is acquired in maybeAcquireSplitLock), and you need to hold that lock to mutate the HardState. We loaded the HardState just above, and are writing it again, so everything is still there.

[sumeer] maybeAcquireSplitLock calls getOrCreateReplica with the replicaID that we are trying to create. My reading of tryGetOrCreateReplica is that if it finds a Replica that is newer, it will return nil

if repl.mu.replicaID > replicaID {
// The sender is behind and is sending to an old replica.
// We could silently drop this message but this way we'll inform the
// sender that they may no longer exist.
repl.raftMu.Unlock()
return nil, false, &roachpb.RaftGroupDeletedError{}
}
and we will then return here
https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/replica_raft.go#L1828-L1832 without acquiring any lock.
Then in the code here we will look up the Replica using the RangeID and will find this newer Replica which isn't locked.

[tbg]
Ouch, yes, you are right. Throw in a rightRepl.raftMu.AssertHeld() and hopefully some test will fail under race. Then

if rightRepl == nil || rightRepl.isNewerThanSplit(split) {
  if rightRepl != nil {
    rightRepl.raftMu.Lock()
    defer rightRepl.raftMu.Unlock() // is that even enough? Do we do more stuff with rightRepl in splitPostApply?
  }
}

hopefully fixes that failure?

cc: @tbg @erikgrinaker

Jira issue: CRDB-12877

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Feb 3, 2022
@tbg
Copy link
Member

tbg commented Nov 11, 2022

This got fixed, I merged the patch above months ago in f19bb2a.

@tbg tbg closed this as completed Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants