Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: fix replicate queue metrics tracking bug #86844

Merged
merged 1 commit into from
Aug 26, 2022

Conversation

AlexTalks
Copy link
Contributor

@AlexTalks AlexTalks commented Aug 25, 2022

In #85844, tracking replicate queue success and error metrics by
allocator action was introduced, however this change inadvertently
included a bug due to missing the RemoveDead(Non)Voter allocator
action case, in combination with a panic on unexpected actions. This
change fixes the missing case as well as the panic.

Fixes #86544, fixes #86374, fixes #86107.

Release justification: Bug fix
Release note: None

@AlexTalks AlexTalks requested a review from a team as a code owner August 25, 2022 07:38
@AlexTalks AlexTalks requested a review from irfansharif August 25, 2022 07:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -481,6 +481,12 @@ func (metrics *ReplicateQueueMetrics) trackResultByAllocatorAction(
} else {
metrics.ReplaceDeadReplicaErrorCount.Inc(1)
}
case allocatorimpl.AllocatorRemoveDeadVoter, allocatorimpl.AllocatorRemoveDeadNonVoter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this case be folded into the one right above?

@@ -494,7 +500,7 @@ func (metrics *ReplicateQueueMetrics) trackResultByAllocatorAction(
metrics.RemoveDecommissioningReplicaErrorCount.Inc(1)
}
default:
panic(fmt.Sprintf("unsupported AllocatorAction: %v", action))
// Unsupported AllocatorAction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth log.Errorf-ing, or fatal-ing under buildutil.CrdbTestBuild.

@AlexTalks
Copy link
Contributor Author

The reason this wasn't caught in tests was because TestReplicateQueueDeadNonVoters is skipped, per #76996. I spent a bit of time attempting to unskip this, but haven't resolved that just yet.

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @irfansharif)


pkg/kv/kvserver/replicate_queue.go line 484 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Should this case be folded into the one right above?

No, we have separate metrics for ReplaceDead and RemoveDead.


pkg/kv/kvserver/replicate_queue.go line 503 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Worth log.Errorf-ing, or fatal-ing under buildutil.CrdbTestBuild.

Will do.

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @irfansharif from a discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/replicate_queue.go line 503 at r1 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Will do.

Done.

@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 26, 2022

Build failed (retrying...):

@AlexTalks
Copy link
Contributor Author

bors cancel
(Just noticed this is going to have a merge conflict)

@craig
Copy link
Contributor

craig bot commented Aug 26, 2022

Canceled.

In cockroachdb#85844, tracking replicate queue success and error metrics by
allocator action was introduced, however this change inadvertently
included a bug due to missing the `RemoveDead(Non)Voter` allocator
action case, in combination with a panic on unexpected actions. This
change fixes the missing case as well as the panic.

Release justification: Bug fix
Release note: None
@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 26, 2022

Build succeeded:

@craig craig bot merged commit 82e2a30 into cockroachdb:master Aug 26, 2022
@AlexTalks AlexTalks deleted the fix_rq_metrics branch August 26, 2022 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants