kvserver: switch to append-time conf changes #107083
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Is your feature request related to a problem? Please describe.
etcd-io/raft uses apply-time conf changes. These are not what's described in the raft paper, and they are much more complicated to reason about1. They become a mental burden every time we reason about the durability requirements of the
HardState.Commit
index, for example in #99273 (comment)) and #88699. Even if these problems are ultimately tractable, it's just complex.The reason it was introduced as is is that it is also convenient to have the active raft config track exactly the RangeDescriptor. With append-time conf changes, we need to track the active config separately (it's the latest one synced in the log) and might need to revert it (if the log gets rewritten). We would also need a bit more observability if the active config leads the range descriptor (which happens while the new config is in the log but not applied yet).
Given how much cognitive trouble we've had with apply-time conf changes, I think these trade-offs are well worth it.
I believe we could use append-time conf changes without changing upstream raft: we stop using
raftpb.ConfChange{,V2}
, instead proposing our conf changes as regular proposals. The fact that they're conf changes instead gets encoded in ourRaftCommandPrefix
. Our code that appends to the log can then sniff them out and update state as appropriate, callingrawNode.ApplyConfChange
, which looks like it will "just work". We should be able to add appropriate testing upstream to make this a "permissible" use ofetcd-io/raft
(rather than us hacking it in ways that might break in the future).I haven't verified this approach above, but it lends itself to prototyping.
Jira issue: CRDB-39895
Epic CRDB-39898
Footnotes
https://github.com/etcd-io/etcd/issues/7625#issuecomment-489232411 ↩
The text was updated successfully, but these errors were encountered: