-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: Avoid scanning raft log in becomeLeader #9073
Conversation
Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change *could* exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.
Codecov Report
@@ Coverage Diff @@
## master #9073 +/- ##
=========================================
Coverage ? 76.08%
=========================================
Files ? 359
Lines ? 29944
Branches ? 0
=========================================
Hits ? 22783
Misses ? 5576
Partials ? 1585
Continue to review full report at Codecov.
|
can you take a look at this PR first? I will take a look at it in a couple of days. |
@xiang90 ok |
@bdarnell The overall ideal looks good to me. This will not affect etcd a lot since etcd is not reconfig heavy nor having issues with reading log tails (since we keep logs purely in mem). I would like @siddontang from tikv side to have a look before we merge this in. |
@@ -682,12 +687,13 @@ func (r *raft) becomeLeader() { | |||
r.logger.Panicf("unexpected error getting uncommitted entries (%v)", err) | |||
} | |||
|
|||
nconf := numOfPendingConf(ents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that using pendingConfIndex only reduce calling numOfPendingConf here, can this reduce the performance too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not have a significant negative impact on performance. The only thing affected is the ability to propose new config changes, and the impact is small. The worst case scenario is when you have one up-to-date follower and one follower that is behind, then the leader dies and the up-to-date follower becomes the new leader.
Before, the new leader could immediately propose a config change, but that config change wouldn't be applied until the other follower catches up (acknowledging the log entries, but not necessarily applying them)
With this change, the follower must catch up before any config change can be proposed. So this only adds one round trip to membership changes proposed immediately after an election.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
LGTM |
@bdarnell thanks. merging. |
Picks up a cherry-picked version of etcd-io/etcd#9073, to fix cockroachdb#18601 Release note (bug fix): Fixes potential cluster unavailability after raft logs grow too large.
24889: cherrypick-1.1: build: Update etcd r=bdarnell a=bdarnell Picks up a cherry-picked version of etcd-io/etcd#9073, to fix #18601 Release note (bug fix): Fixes potential cluster unavailability after raft logs grow too large. Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
I meant to do this in etcd-io#9073, but sent the PR before it was finished. The last log index is known directly; there is no need to fetch any entries here.
I meant to do this in etcd-io#9073, but sent the PR before it was finished. The last log index is known directly; there is no need to fetch any entries here.
@bdarnell did we mean to use Lines 863 to 870 in 90c5968
Seems like we're still scanning the uncommitted portion of the raft log before campaigning. |
@nvanbenschoten I think I missed that by accident, but I'm not sure we can use pendingConfIndex there.
I'm not sure why this check is even here. It seems to be addressing an edge case: if there are unapplied config changes that we know are committed, we might as well apply them before starting a campaign so we send MsgVotes to the right nodes. But it's not needed for correctness; in the worst case if there is a quorum of live nodes in the new config but not in the old one the election will fail and need to be retried (which it will, after a timeout). But this is an expensive way to avoid that problem, and I think it would be better to just remove the pending conf check here entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Scanning the uncommitted portion of the raft log to determine whether
there are any pending config changes can be expensive. In
cockroachdb/cockroach#18601, we've seen that a new leader can spend so
much time scanning its log post-election that it fails to send
its first heartbeats in time to prevent a second election from
starting immediately.
Instead of tracking whether a pending config change exists with a
boolean, this commit tracks the latest log index at which a pending
config change could exist. This is a less expensive solution to
the problem, and the impact of false positives should be minimal since
a newly-elected leader should be able to quickly commit the tail of
its log.