raft: Avoid scanning raft log in becomeLeader #9073

bdarnell · 2017-12-30T15:20:50Z

Scanning the uncommitted portion of the raft log to determine whether
there are any pending config changes can be expensive. In
cockroachdb/cockroach#18601, we've seen that a new leader can spend so
much time scanning its log post-election that it fails to send
its first heartbeats in time to prevent a second election from
starting immediately.

Instead of tracking whether a pending config change exists with a
boolean, this commit tracks the latest log index at which a pending
config change could exist. This is a less expensive solution to
the problem, and the impact of false positives should be minimal since
a newly-elected leader should be able to quickly commit the tail of
its log.

Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change *could* exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.

codecov-io · 2017-12-30T16:23:52Z

Codecov Report

❗ No coverage uploaded for pull request base (master@fdec12c). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master    #9073   +/-   ##
=========================================
  Coverage          ?   76.08%           
=========================================
  Files             ?      359           
  Lines             ?    29944           
  Branches          ?        0           
=========================================
  Hits              ?    22783           
  Misses            ?     5576           
  Partials          ?     1585

Impacted Files	Coverage Δ
raft/node.go	`89.73% <ø> (ø)`
raft/rawnode.go	`68.33% <ø> (ø)`
raft/raft.go	`91.5% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdec12c...8d8f319. Read the comment docs.

xiang90 · 2018-01-03T00:29:17Z

@siddontang

can you take a look at this PR first? I will take a look at it in a couple of days.

siddontang · 2018-01-03T03:38:56Z

@xiang90 ok

xiang90 · 2018-01-05T18:41:23Z

@bdarnell The overall ideal looks good to me. This will not affect etcd a lot since etcd is not reconfig heavy nor having issues with reading log tails (since we keep logs purely in mem). I would like @siddontang from tikv side to have a look before we merge this in.

siddontang · 2018-01-07T08:47:10Z

raft/raft.go

@@ -682,12 +687,13 @@ func (r *raft) becomeLeader() {
 		r.logger.Panicf("unexpected error getting uncommitted entries (%v)", err)
 	}

-	nconf := numOfPendingConf(ents)


I see that using pendingConfIndex only reduce calling numOfPendingConf here, can this reduce the performance too much?

It should not have a significant negative impact on performance. The only thing affected is the ability to propose new config changes, and the impact is small. The worst case scenario is when you have one up-to-date follower and one follower that is behind, then the leader dies and the up-to-date follower becomes the new leader.

Before, the new leader could immediately propose a config change, but that config change wouldn't be applied until the other follower catches up (acknowledging the log entries, but not necessarily applying them)

With this change, the follower must catch up before any config change can be proposed. So this only adds one round trip to membership changes proposed immediately after an election.

siddontang · 2018-01-09T00:35:48Z

LGTM

xiang90 · 2018-01-09T00:37:32Z

@bdarnell thanks. merging.

Picks up a cherry-picked version of etcd-io/etcd#9073, to fix cockroachdb#18601 Release note (bug fix): Fixes potential cluster unavailability after raft logs grow too large.

24889: cherrypick-1.1: build: Update etcd r=bdarnell a=bdarnell Picks up a cherry-picked version of etcd-io/etcd#9073, to fix #18601 Release note (bug fix): Fixes potential cluster unavailability after raft logs grow too large. Co-authored-by: Ben Darnell <ben@cockroachlabs.com>

I meant to do this in etcd-io#9073, but sent the PR before it was finished. The last log index is known directly; there is no need to fetch any entries here.

nvanbenschoten · 2018-10-08T23:45:58Z

@bdarnell did we mean to use r.pendingConfIndex in place of the following logic as well?

etcd/raft/raft.go

Lines 863 to 870 in 90c5968

    
           ents, err := r.raftLog.slice(r.raftLog.applied+1, r.raftLog.committed+1, noLimit) 
        
           if err != nil { 
        
           	r.logger.Panicf("unexpected error getting unapplied entries (%v)", err) 
        
           } 
        
           if n := numOfPendingConf(ents); n != 0 && r.raftLog.committed > r.raftLog.applied { 
        
           	r.logger.Warningf("%x cannot campaign at term %d since there are still %d pending configuration changes to apply", r.id, r.Term, n) 
        
           	return nil 
        
           }

Seems like we're still scanning the uncommitted portion of the raft log before campaigning.

bdarnell · 2018-10-09T18:15:50Z

@nvanbenschoten I think I missed that by accident, but I'm not sure we can use pendingConfIndex there.

This is not the uncommitted portion, this is the committed-but-not-applied portion. This is normally much smaller, although it can still be large.
pendingConfIndex is only set on the leader; followers and candidates don't track it.
To avoid scanning the uncommitted log, we set pendingConfIndex to a value that is too high and allows false positives. This is OK when the consequence of a false positive is delaying any future ProposeConfChanges, but if it prevents nodes from campaigning or becoming leader, that's more of a problem. If the group has no leader and no one willing to campaign, no new entries will get committed and we'll be stuck. (That's not actually an issue here because we're talking about unapplied entries instead of uncommitted, and unapplied entries will apply even without a leader).
Blocking before campaigning is much less of a problem than blocking after becoming leader. The latter would cause the new leader to immediately miss heartbeats and lose its leadership, but a follower that blocks for a while before deciding to campaign won't really hurt anything.

I'm not sure why this check is even here. It seems to be addressing an edge case: if there are unapplied config changes that we know are committed, we might as well apply them before starting a campaign so we send MsgVotes to the right nodes. But it's not needed for correctness; in the worst case if there is a quorum of live nodes in the new config but not in the old one the election will fail and need to be retried (which it will, after a timeout). But this is an expensive way to avoid that problem, and I think it would be better to just remove the pending conf check here entirely.

Nidaozi

LGTM

bdarnell added the area/raft label Dec 30, 2017

bdarnell requested a review from xiang90 December 30, 2017 15:20

bdarnell mentioned this pull request Dec 30, 2017

storage: avoid reading uncommitted tail of Raft log when becoming leader cockroachdb/cockroach#18601

Closed

siddontang mentioned this pull request Jan 2, 2018

raft: Avoid scanning raft log in becomeLeader tikv/tikv#2630

Closed

xiang90 self-assigned this Jan 3, 2018

siddontang reviewed Jan 7, 2018

View reviewed changes

xiang90 merged commit ed1ff9e into etcd-io:master Jan 9, 2018

bdarnell deleted the pending-conf-index branch January 9, 2018 19:59

bdarnell mentioned this pull request Jan 9, 2018

build: Update coreos/etcd dependency cockroachdb/cockroach#21356

Merged

siddontang mentioned this pull request Jan 15, 2018

raft: Avoid scanning raft log in becomeLeader tikv/raft-rs#11

Closed

csmoe mentioned this pull request Jan 22, 2018

raft: Avoid scanning raft log in become_leader tikv/raft-rs#15

Merged

bdarnell added a commit to cockroachdb/vendored that referenced this pull request Feb 10, 2018

Backport etcd-io/etcd#9073 to our 1.0 release branch

d047fb8

bdarnell added a commit to bdarnell/cockroach that referenced this pull request Feb 10, 2018

Update raft dependency for backport of etcd-io/etcd#9073

6908f8d

bdarnell added a commit to bdarnell/cockroach that referenced this pull request Feb 10, 2018

Update raft dependency for backport of etcd-io/etcd#9073

66ad7ad

bdarnell added a commit to cockroachdb/vendored that referenced this pull request Apr 17, 2018

Backport etcd-io/etcd#9073 to release-1.1

bf64931

bdarnell mentioned this pull request Apr 17, 2018

cherrypick-1.1: build: Update etcd cockroachdb/cockroach#24889

Merged

This was referenced Jun 25, 2018

High Service Latency: SQL in Cockroach DB v2.0.2 cockroachdb/cockroach#26830

Closed

storage: Raft log is still read on elections cockroachdb/cockroach#26946

Closed

bdarnell mentioned this pull request Jun 26, 2018

raft: Really avoid scanning raft log in becomeLeader #9887

Merged

tbg added a commit to cockroachdb/vendored that referenced this pull request Jun 27, 2018

bump etcd/raft to pick up etcd-io/etcd#9073

a5ddf20

tbg mentioned this pull request Jun 27, 2018

vendor: bump etcd/raft to pick up memory usage fix cockroachdb/cockroach#27009

Merged

nvanbenschoten mentioned this pull request Oct 9, 2018

raft: provide protection against unbounded Raft log growth #10167

Merged

absolute8511 mentioned this pull request Mar 13, 2019

Consider merge some optimize from etcd-raft youzan/ZanRedisDB#23

Open

Nidaozi reviewed Nov 13, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: Avoid scanning raft log in becomeLeader #9073

raft: Avoid scanning raft log in becomeLeader #9073

bdarnell commented Dec 30, 2017 •

edited by gyuho

Loading

codecov-io commented Dec 30, 2017

xiang90 commented Jan 3, 2018

siddontang commented Jan 3, 2018

xiang90 commented Jan 5, 2018

siddontang Jan 7, 2018

bdarnell Jan 8, 2018

siddontang Jan 9, 2018

siddontang commented Jan 9, 2018

xiang90 commented Jan 9, 2018

nvanbenschoten commented Oct 8, 2018

bdarnell commented Oct 9, 2018

Nidaozi left a comment

raft: Avoid scanning raft log in becomeLeader #9073

raft: Avoid scanning raft log in becomeLeader #9073

Conversation

bdarnell commented Dec 30, 2017 • edited by gyuho Loading

codecov-io commented Dec 30, 2017

Codecov Report

xiang90 commented Jan 3, 2018

siddontang commented Jan 3, 2018

xiang90 commented Jan 5, 2018

siddontang Jan 7, 2018

Choose a reason for hiding this comment

bdarnell Jan 8, 2018

Choose a reason for hiding this comment

siddontang Jan 9, 2018

Choose a reason for hiding this comment

siddontang commented Jan 9, 2018

xiang90 commented Jan 9, 2018

nvanbenschoten commented Oct 8, 2018

bdarnell commented Oct 9, 2018

Nidaozi left a comment

Choose a reason for hiding this comment

bdarnell commented Dec 30, 2017 •

edited by gyuho

Loading