Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: switch to append-time conf changes #107083

Open
tbg opened this issue Jul 18, 2023 · 1 comment
Open

kvserver: switch to append-time conf changes #107083

tbg opened this issue Jul 18, 2023 · 1 comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Jul 18, 2023

Is your feature request related to a problem? Please describe.

etcd-io/raft uses apply-time conf changes. These are not what's described in the raft paper, and they are much more complicated to reason about1. They become a mental burden every time we reason about the durability requirements of the HardState.Commit index, for example in #99273 (comment)) and #88699. Even if these problems are ultimately tractable, it's just complex.

The reason it was introduced as is is that it is also convenient to have the active raft config track exactly the RangeDescriptor. With append-time conf changes, we need to track the active config separately (it's the latest one synced in the log) and might need to revert it (if the log gets rewritten). We would also need a bit more observability if the active config leads the range descriptor (which happens while the new config is in the log but not applied yet).

Given how much cognitive trouble we've had with apply-time conf changes, I think these trade-offs are well worth it.

I believe we could use append-time conf changes without changing upstream raft: we stop using raftpb.ConfChange{,V2}, instead proposing our conf changes as regular proposals. The fact that they're conf changes instead gets encoded in our RaftCommandPrefix. Our code that appends to the log can then sniff them out and update state as appropriate, calling rawNode.ApplyConfChange, which looks like it will "just work". We should be able to add appropriate testing upstream to make this a "permissible" use of etcd-io/raft (rather than us hacking it in ways that might break in the future).

I haven't verified this approach above, but it lends itself to prototyping.

Jira issue: CRDB-39895

Epic CRDB-39898

Footnotes

  1. https://github.com/etcd-io/etcd/issues/7625#issuecomment-489232411

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Jul 18, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 18, 2023

cc @cockroachdb/replication

@jlinder jlinder added the T-kv KV Team label Jun 28, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

2 participants