-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: TestNewTruncateDecision failed under stress [skipped] #38584
Comments
Looks like an empty entry is proposed sometimes here: cockroach/pkg/storage/replica_raft_quiesce.go Lines 95 to 98 in 1ff3556
I initially thought this test had to have gotten flaky as a result of #38484 but I just reverted that locally and still get the repro, so now I'm thinking it's fallout from #38343 because this diff fixes it diff --git a/pkg/storage/replica_raft.go b/pkg/storage/replica_raft.go
index 1266771077..22fa9b375a 100644
--- a/pkg/storage/replica_raft.go
+++ b/pkg/storage/replica_raft.go
@@ -428,13 +428,14 @@ func (r *Replica) handleRaftReadyRaftMuLocked(
defer r.updateProposalQuotaRaftMuLocked(ctx, lastLeaderID)
err := r.withRaftGroupLocked(true, func(raftGroup *raft.RawNode) (bool, error) {
- if err := r.mu.proposalBuf.FlushLockedWithRaftGroup(raftGroup); err != nil {
+ nEnts, err := r.mu.proposalBuf.FlushLockedWithRaftGroup(raftGroup)
+ if err != nil {
return false, err
}
if hasReady = raftGroup.HasReady(); hasReady {
rd = raftGroup.Ready()
}
- return hasReady /* unquiesceAndWakeLeader */, nil
+ return hasReady && nEnts == 0 /* unquiesceAndWakeLeader */, nil
})
r.mu.Unlock()
if err != nil { What's happening is that the request that unquiesces the range sometimes flushes proposals, but then we were adding an extra empty proposal due to returning I'll send a PR.
|
If proposals were stepped into Raft, we don't have to also step another empty entry in. This was harmless, but it was confusing TestNewTruncateDecision (which admittedly is not a very robust test, though we have to thank it for catching this). Fixes cockroachdb#38584. Release note: None
Nice job tracking this down! |
We've identified the root cause, but some cleanup presents itself, and in the meantime the test is too flaky to leave unskipped. See cockroachdb#38652. Touches cockroachdb#38584. Release note: None
38655: storage: skip TestNewTruncateDecision r=nvanbenschoten a=tbg We've identified the root cause, but some cleanup presents itself, and in the meantime the test is too flaky to leave unskipped. See #38652. Touches #38584. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
cc @cockroachdb/replication |
This test no longer exists. |
Oops, was looking at the wrong repo. |
SHA: https://github.com/cockroachdb/cockroach/commits/ca1ef4d4f8296b213c0b2b140f16e4a97931e6e7
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1368520&tab=buildLog
Jira issue: CRDB-5625
The text was updated successfully, but these errors were encountered: