Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: don't clear raftRequestQueue of right-hand side of Range split #64028

Merged

Commits on Apr 21, 2021

  1. kv: don't clear raftRequestQueue of right-hand side of Range split

    This commit fixes a test flake of `TestLeaderAfterSplit` I observed in CI and
    which we've seen at least once in cockroachdb#43564 (comment).
    I bisected the flake back to a591707, but that wasn't the real source of the
    flakiness - the move from `multiTestContext` to `TestCluster` just changed
    transport mechanism between replicas and revealed an existing bug.
    
    The real issue here was that, upon applying a split, any previously established
    `raftRequestQueue` to the RHS replica was discarded. The effect of this is that
    we could see the following series of events:
    ```
    1. r1 is created from a split
    2. r1 campaigns to establish a leader for the new range
    3. r1 sends MsgPreVote msgs to r2 and r3
    4. s2 and s3 both receive the messages for the uninitialized r2 and r3, respectively.
    5. raftRequestQueues are established for r2 and r3, and the MsgPreVotes are added
    6. the split triggers to create r2 and r3 finally fire
    7. the raftRequestQueues for r2 and r3 are discarded
    8. the election stalls indefinitely, because the test sets RaftElectionTimeoutTicks=1000000
    ```
    
    Of course, in real deployments, `RaftElectionTimeoutTicks` will never be set so
    high, so a new election will be called again after about 3 seconds. Still, this
    could cause unavailability immediately after a split for about 3s even in real
    deployments, so it seems worthwhile to fix.
    
    This commit fixes the issue by removing the logic to discard an uninitialized
    replica's `raftRequestQueue` upon applying a split that initializes the replica.
    That logic looks quite intentional, but if we look back at when it was added, we
    see that it wasn't entirely deliberate. The code was added in d3b0e73, which
    extracted everything except the call to `s.mu.replicas.Delete(int64(rangeID))`
    from `unlinkReplicaByRangeIDLocked`. So the change wasn't intentionally
    discarding the queue, it was just trying not to change the existing behavior.
    
    This change is safe and does not risk leaking the `raftRequestQueue` because
    we are removing from `s.mu.uninitReplicas` but will immediately call into
    `addReplicaInternalLocked` to add an initialized replica.
    
    Release notes (bug fix): Fix a rare race that could lead to a 3 second stall
    before a Raft leader was elected on a Range immediately after it was split off
    from its left-hand neighbor.
    nvanbenschoten committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    4e5389a View commit details
    Browse the repository at this point in the history