Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[poc,dnm]: storage: use learner replicas
This is a PR just to show some code to the interested parties. The real thing to look at here is the explanation and suggested strategy below. Don't review the code. ---- Learner replicas are full Replicas minus the right to vote (i.e they don't count for quorum). They are interesting to us because they allow us to phase out preemptive snapshots (via [delayed preemptive snaps] as a migration strategy; see there for all the things we don't like about preemptive snapshots), and because they can help us avoid spending more time in vulnerable configurations than we need to. To see an example of the latter, assume we're trying to upreplicate from three replicas to five replicas. As is, we need to add a fourth replica, wait for a preemptive snapshot, and add the fifth. We spend approximately the duration of one preemptive snapshot in an even replica configuration. In theory, we could send two preemptive snapshots first and then carry out two replica additions, but with learners, it would be much more straightforward and less error-prone. This doesn't solve the problems in cockroachdb#12768, but it helps avoid them. This PR shows the bare minimum of code changes to upreplicate using learner replicas and suggests further steps to make them a reality. Principally, to use learner replicas, we barely need to make any changes. Our current up-replication code is this: 1. send preemptive snapshot 1. run the ChangeReplicas txn, which adds a `ReplicaDescriptor` to the replicated range descriptor and, on commit, induces a Raft configuration change (`raftpb.ConfChangeAddNode`). The new up-replication code (note that the old one has to stick around, because compatibility): 1. run the ChangeReplicas txn, which adds a `ReplicaDescriptor` to the replicated range descriptor with the `Learner` flag set to true and, on commit, induces a Raft configuration change (`raftpb.ConfChangeAddLearnerNode`). 2. wait for the learner to have caught up or send it a Raft snapshot proactively (either works, just have to make sure not to duplicate work) 3. run a ChangeReplicas txn which removes the `Learner` flag from the `ReplicaDescriptor` and induces a Raft conf change of `raftpb.ConfChangeAddNode` (upgrading the learner). The existence of learners will need updates throughout the allocator so that it realizes that they don't count for quorum and are either upgraded or removed in a timely manner. None of that is in this POC. [delayed preemptive snaps]: cockroachdb#35786 Release note: None
- Loading branch information