Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: persist Lead into the HardState #126898

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

arulajmani
Copy link
Collaborator

Previously, leader information for a given term was stored as volatile state in raft. This change persists it durably by moving it to HardState. This is done in service of leader leases, where we want to ensure a follower doesn't violate its promise of support to the leader even after a restart.

Fixes #124421

Release note: None

@arulajmani arulajmani requested a review from a team as a code owner July 9, 2024 16:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 39 of 39 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/raft/raft.go line 478 at r1 (raw file):

		raftlog.appliedTo(c.Applied, 0 /* size */)
	}
	r.becomeFollower(r.Term, r.lead)

One of the interesting parts about this change is that a follower can now know that it was the leader before it restarted. It would be nice to have a test that creates this scenario and demonstrates that the old leader (who can no longer serve as the leader due to a restart) campaigns at the next term.


pkg/raft/raft.go line 773 at r1 (raw file):

		r.Term = term
		r.Vote = None
	}

This is a subtle change. Was it needed? Or is this just an extra layer of protection against forgetting the leader.


pkg/raft/rawnode.go line 180 at r1 (raw file):

// that a synchronous write to persistent storage is required.
func MustSync(st, prevst pb.HardState, entsnum int) bool {
	// Persistent state on all servers:

This comment needs an update.


pkg/raft/raftpb/raft.proto line 125 at r1 (raw file):

  optional uint64 match = 15 [(gogoproto.nullable) = false];

  optional uint64 lead   = 16 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];

Let's give this a comment and explain when it is set.


pkg/raft/raftpb/raft.proto line 132 at r1 (raw file):

	optional uint64 vote   = 2 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];
	optional uint64 commit = 3 [(gogoproto.nullable) = false];
  optional uint64 lead   = 4 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];

Reviewable is indicating that the alignment with the other three fields is off here, because they use tabs while this line uses spaces.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved comments, PTAL.

I've added a bunch of TODOs in the latest revision. I don't think any of these are in the critical path of this patch, however, we should talk about these as they'll guide the design for a few upcoming things.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/raft/raft.go line 478 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

One of the interesting parts about this change is that a follower can now know that it was the leader before it restarted. It would be nice to have a test that creates this scenario and demonstrates that the old leader (who can no longer serve as the leader due to a restart) campaigns at the next term.

I extended TestRawNodeRestart to demonstrate this.


pkg/raft/raft.go line 773 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is a subtle change. Was it needed? Or is this just an extra layer of protection against forgetting the leader.

Reverted this, like we spoke about offline. I've added a TODO to make this change in a separate PR.


pkg/raft/raftpb/raft.proto line 125 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's give this a comment and explain when it is set.

I moved this up and updated the comment on vote that was talking about MsgStorageAppends (re)constructing a HardState.


pkg/raft/raftpb/raft.proto line 132 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Reviewable is indicating that the alignment with the other three fields is off here, because they use tabs while this line uses spaces.

Done. I converted the entire file to tabs, so it picked up another field (match) that was recently introduced. I'll (lazily) let it be part of this commit unless you prefer otherwise.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/raft/raft.go line 1243 at r2 (raw file):

Technically, it could win this election before the previous QSE expires.

The plan is to only even enter this CheckQuorum step down case if the QSE has already expired, right? So I'm not sure that this is possible on this specific step down case, but it is possible in the more general step down case.


pkg/raft/raft.go line 1244 at r2 (raw file):

			// election. Technically, it could win this election before the previous
			// QSE expires.
			// TODO(arul): update the tense of this once laeder leases are implemented.

s/leader/laeder/


pkg/raft/raft.go line 1253 at r2 (raw file):

			// store liveness for itself), it's safe for leader leases.
			//
			// TODO(arul,nvanbenschoten): One thing special about this case is that if

+1 on this being a special case. We probably don't want to pass None and forget the leader though. Better to have the condition for this step down and the result of the step down (i.e. immediately call an election) remain loosely coupled.


pkg/raft/raft.go line 1993 at r2 (raw file):

			// NB: Similar to the CheckQuorum step down case, we must remember our
			// prior stint as leader, lest we regress the QSE.
			r.becomeFollower(r.Term, r.lead)

Now that we're often passing r.lead to r.becomeFollower, should we remove the lead argument from the function? We can then package assignLead as a separate method with some safety checks to replace the r.lead = m.From and r.becomeFollower(m.Term, m.From) calls. Doesn't need to be this PR.


pkg/raft/raftpb/raft.proto line 91 at r2 (raw file):

	repeated Entry       entries     = 7  [(gogoproto.nullable) = false];
	optional uint64      commit      = 8  [(gogoproto.nullable) = false];
	optional uint64 lead   = 16 [(gogoproto.nullable) = false, (gogoproto.casttype) = "PeerID"];

nit: this alignment with the previous fields looks off.


pkg/kv/kvserver/below_raft_protos_test.go line 67 at r2 (raw file):

				Term:   n % 3,
				Vote:   raftpb.PeerID(n % 7),
				Commit: n % 11,

Want to add Lead here?

Previously, leader information for a given term was stored as volatile
state in raft. This change persists it durably by moving it to
HardState. This is done in service of leader leases, where we want to
ensure a follower doesn't violate its promise of support to the leader
even after a restart.

Fixes cockroachdb#124421

Release note: None
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/raft/raft.go line 1243 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Technically, it could win this election before the previous QSE expires.

The plan is to only even enter this CheckQuorum step down case if the QSE has already expired, right? So I'm not sure that this is possible on this specific step down case, but it is possible in the more general step down case.

Yeah. Cleaned up the commentary here.


pkg/raft/raft.go line 1253 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

+1 on this being a special case. We probably don't want to pass None and forget the leader though. Better to have the condition for this step down and the result of the step down (i.e. immediately call an election) remain loosely coupled.

Agreed. As we discussed over the last week, we don't want to treat this any different. I've just removed this stream of thought and cleaned up the commentary to call out the special case.


pkg/raft/raft.go line 1993 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Now that we're often passing r.lead to r.becomeFollower, should we remove the lead argument from the function? We can then package assignLead as a separate method with some safety checks to replace the r.lead = m.From and r.becomeFollower(m.Term, m.From) calls. Doesn't need to be this PR.

I've added a TODO for this, for now. I'll address it in a followup.


pkg/raft/raftpb/raft.proto line 91 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: this alignment with the previous fields looks off.

Fixed.


pkg/kv/kvserver/below_raft_protos_test.go line 67 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Want to add Lead here?

Done.

@craig craig bot merged commit 743b721 into cockroachdb:master Jul 17, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

raft: persist Lead in HardState
3 participants