-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: fix panic on MsgApp after log truncation #31
Conversation
00532c3
to
2da1d51
Compare
@nvanbenschoten @tbg PTAL Commit 1: the minimal fix. |
@ahrtr PTAL |
This commit fixes a panic in the following scenario: 1. The flow of MsgApp from the leader to a follower is throttled (i.e. Inflights is full). 2. The leader doesn't fetch entries from storage and only periodically sends MsgApps with empty Entries. 3. A log compaction/truncation happens in the background, and cuts out a portion of the log beyond what's still in-flight towards the slow follower (i.e. Progress.Match < log cutoff) 4. Some messages to the slow follower get dropped, and as a result it replies with a rejection MsgAppResp. 5. The leader resets Progress.Next = Progress.Match+1, and is about to retry sending entries from this point. 6. In raft.maybeSendAppend it calls raftLog.term and gets 0 for the missing entry (instead of some indication/error that the log was truncated at this index). It also skips fetching entries (as in step 2), and goes ahead sending an empty MsgApp (with LogTerm = 0 and a fresh Commit index). 7. When the follower gets this MsgApp, in raftLog.maybeAppend it a) wrongly passes the matchTerm check because the 0 index matches the 0 corresponding to a missing entry in the local log, b) tries to bump the Commit index and panics because this index is beyond its local log's lastIndex(). This bug was introduced in 42419da. Specifically, the steps (2) and (6) previously used to unconditionally fetch raftLog.entries(), which would return ErrCompacted in the above scenario, and prevent sending the problematic MsgApp. The commit above inroduced a condition under which the ErrCompacted would be unnoticed. This commit makes maybeSendAppend more aware of this compaction scenario, and prevents sending the problematic MsgApp. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
This commit makes raftLog.term() call return ErrCompacted and ErrUnavailable errors if the requested log index is out of bounds. Previously it would return 0 which was an error-prone behaviour. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2da1d51
to
d0fb0cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Github isn't liking my approval. It may be because I'm not a maintainer. |
This commit makes the testing loggers print Panic and Fatal agruments before redirecting them to the panic() call. Previously they would be displayed in a non-human-readable way, as something like: panic: ([]interface {}) 0x1400000eb88 Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @pavelkalinnikov
This PR fixes a panic and adds a regression test for the following scenario:
MsgApp
from the leader to a follower is throttled (i.e.Inflights
is full).MsgApp
s with emptyEntries
.Progress.Match
< log cutoff)MsgAppResp
.Progress.Next = Progress.Match+1
, and is about to retry sending entries from this point.raft.maybeSendAppend
it callsraftLog.term
and gets 0 for the missing entry (instead of some indication/error that the log was truncated at this index). It also skips fetching entries (as in step 2), and goes ahead sending an emptyMsgApp
(withLogTerm = 0
and a freshCommit
index).MsgApp
, inraftLog.maybeAppend
it a) wrongly passes thematchTerm
check because the 0 index matches the 0 corresponding to a missing entry in the local log, b) tries to bump theCommit
index and panics because this index is beyond its local log'slastIndex()
.This bug was introduced in 42419da. Specifically, the steps (2) and (6) previously used to unconditionally fetch
raftLog.entries()
, which would returnErrCompacted
in the above scenario, and prevent sending the problematicMsgApp
. The commit above introduced a condition under which theErrCompacted
would be unnoticed.This PR makes
maybeSendAppend
more aware of this compaction scenario, and prevents sending the problematicMsgApp
.