-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: persist Lead into the HardState #126898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 39 of 39 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/raft/raft.go
line 478 at r1 (raw file):
raftlog.appliedTo(c.Applied, 0 /* size */) } r.becomeFollower(r.Term, r.lead)
One of the interesting parts about this change is that a follower can now know that it was the leader before it restarted. It would be nice to have a test that creates this scenario and demonstrates that the old leader (who can no longer serve as the leader due to a restart) campaigns at the next term.
pkg/raft/raft.go
line 773 at r1 (raw file):
r.Term = term r.Vote = None }
This is a subtle change. Was it needed? Or is this just an extra layer of protection against forgetting the leader.
pkg/raft/rawnode.go
line 180 at r1 (raw file):
// that a synchronous write to persistent storage is required. func MustSync(st, prevst pb.HardState, entsnum int) bool { // Persistent state on all servers:
This comment needs an update.
pkg/raft/raftpb/raft.proto
line 125 at r1 (raw file):
optional uint64 match = 15 [(gogoproto.nullable) = false]; optional uint64 lead = 16 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];
Let's give this a comment and explain when it is set.
pkg/raft/raftpb/raft.proto
line 132 at r1 (raw file):
optional uint64 vote = 2 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"]; optional uint64 commit = 3 [(gogoproto.nullable) = false]; optional uint64 lead = 4 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];
Reviewable is indicating that the alignment with the other three fields is off here, because they use tabs while this line uses spaces.
b94161a
to
a6f8577
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved comments, PTAL.
I've added a bunch of TODOs in the latest revision. I don't think any of these are in the critical path of this patch, however, we should talk about these as they'll guide the design for a few upcoming things.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/raft/raft.go
line 478 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
One of the interesting parts about this change is that a follower can now know that it was the leader before it restarted. It would be nice to have a test that creates this scenario and demonstrates that the old leader (who can no longer serve as the leader due to a restart) campaigns at the next term.
I extended TestRawNodeRestart
to demonstrate this.
pkg/raft/raft.go
line 773 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is a subtle change. Was it needed? Or is this just an extra layer of protection against forgetting the leader.
Reverted this, like we spoke about offline. I've added a TODO to make this change in a separate PR.
pkg/raft/raftpb/raft.proto
line 125 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's give this a comment and explain when it is set.
I moved this up and updated the comment on vote
that was talking about MsgStorageAppends
(re)constructing a HardState
.
pkg/raft/raftpb/raft.proto
line 132 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Reviewable is indicating that the alignment with the other three fields is off here, because they use tabs while this line uses spaces.
Done. I converted the entire file to tabs, so it picked up another field (match) that was recently introduced. I'll (lazily) let it be part of this commit unless you prefer otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/raft/raft.go
line 1243 at r2 (raw file):
Technically, it could win this election before the previous QSE expires.
The plan is to only even enter this CheckQuorum step down case if the QSE has already expired, right? So I'm not sure that this is possible on this specific step down case, but it is possible in the more general step down case.
pkg/raft/raft.go
line 1244 at r2 (raw file):
// election. Technically, it could win this election before the previous // QSE expires. // TODO(arul): update the tense of this once laeder leases are implemented.
s/leader/laeder/
pkg/raft/raft.go
line 1253 at r2 (raw file):
// store liveness for itself), it's safe for leader leases. // // TODO(arul,nvanbenschoten): One thing special about this case is that if
+1 on this being a special case. We probably don't want to pass None
and forget the leader though. Better to have the condition for this step down and the result of the step down (i.e. immediately call an election) remain loosely coupled.
pkg/raft/raft.go
line 1993 at r2 (raw file):
// NB: Similar to the CheckQuorum step down case, we must remember our // prior stint as leader, lest we regress the QSE. r.becomeFollower(r.Term, r.lead)
Now that we're often passing r.lead
to r.becomeFollower
, should we remove the lead
argument from the function? We can then package assignLead
as a separate method with some safety checks to replace the r.lead = m.From
and r.becomeFollower(m.Term, m.From)
calls. Doesn't need to be this PR.
pkg/raft/raftpb/raft.proto
line 91 at r2 (raw file):
repeated Entry entries = 7 [(gogoproto.nullable) = false]; optional uint64 commit = 8 [(gogoproto.nullable) = false]; optional uint64 lead = 16 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];
nit: this alignment with the previous fields looks off.
pkg/kv/kvserver/below_raft_protos_test.go
line 67 at r2 (raw file):
Term: n % 3, Vote: raftpb.PeerID(n % 7), Commit: n % 11,
Want to add Lead
here?
Previously, leader information for a given term was stored as volatile state in raft. This change persists it durably by moving it to HardState. This is done in service of leader leases, where we want to ensure a follower doesn't violate its promise of support to the leader even after a restart. Fixes cockroachdb#124421 Release note: None
a6f8577
to
e60feee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=nvanbenschoten
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/raft/raft.go
line 1243 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Technically, it could win this election before the previous QSE expires.
The plan is to only even enter this CheckQuorum step down case if the QSE has already expired, right? So I'm not sure that this is possible on this specific step down case, but it is possible in the more general step down case.
Yeah. Cleaned up the commentary here.
pkg/raft/raft.go
line 1253 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
+1 on this being a special case. We probably don't want to pass
None
and forget the leader though. Better to have the condition for this step down and the result of the step down (i.e. immediately call an election) remain loosely coupled.
Agreed. As we discussed over the last week, we don't want to treat this any different. I've just removed this stream of thought and cleaned up the commentary to call out the special case.
pkg/raft/raft.go
line 1993 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Now that we're often passing
r.lead
tor.becomeFollower
, should we remove thelead
argument from the function? We can then packageassignLead
as a separate method with some safety checks to replace ther.lead = m.From
andr.becomeFollower(m.Term, m.From)
calls. Doesn't need to be this PR.
I've added a TODO for this, for now. I'll address it in a followup.
pkg/raft/raftpb/raft.proto
line 91 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this alignment with the previous fields looks off.
Fixed.
pkg/kv/kvserver/below_raft_protos_test.go
line 67 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Want to add
Lead
here?
Done.
Build succeeded: |
Previously, leader information for a given term was stored as volatile state in raft. This change persists it durably by moving it to HardState. This is done in service of leader leases, where we want to ensure a follower doesn't violate its promise of support to the leader even after a restart.
Fixes #124421
Release note: None