-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add maxFailoverCount limit to TiKV #965
Conversation
/run-e2e-in-kind |
9b39e98
to
449260d
Compare
/run-e2e-in-kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-e2e-in-kind |
pkg/manager/member/tikv_failover.go
Outdated
tcName := tc.GetName() | ||
if len(tc.Status.TiKV.FailureStores) >= int(tc.Spec.TiKV.MaxFailoverCount) { | ||
glog.Errorf("%s/%s failure stores count reached the limit: %d", ns, tcName, tc.Spec.TiKV.MaxFailoverCount) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return is nil, so is it more appropriate to output log with warning level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, tidb_failover.go
also needs to be modified.
/run-e2e-in-kind |
@@ -261,6 +261,7 @@ tikv: | |||
# Specify the priorityClassName for TiKV Pod. | |||
# refer to https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#how-to-use-priority-and-preemption | |||
priorityClassName: "" | |||
maxFailoverCount: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to add comments here to make it clear what's the detailed meaning and maybe how to configure the value, e.g. for the cluster with 100 TiKV, what's the suggested config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/manager/member/tikv_failover.go
Outdated
@@ -30,6 +31,13 @@ func NewTiKVFailover(tikvFailoverPeriod time.Duration) Failover { | |||
} | |||
|
|||
func (tf *tikvFailover) Failover(tc *v1alpha1.TidbCluster) error { | |||
ns := tc.GetNamespace() | |||
tcName := tc.GetName() | |||
if len(tc.Status.TiKV.FailureStores) >= int(tc.Spec.TiKV.MaxFailoverCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, If multiple tikv-servers become Down
simultaneously, the MaxFailoverCount
may be exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Addressed PTAL again.
glog.Warningf("%s/%s failure stores count reached the limit: %d", ns, tcName, tc.Spec.TiKV.MaxFailoverCount) | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tidb_failover.go
also needs to be modified like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/run-e2e-in-kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-e2e-in-kind |
cherry pick to release-1.0 failed |
What problem does this PR solve?
we have added this limit to TiDB in this PR: #163
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
Does this PR introduce a user-facing change?:
@xiaojingchen @tennix @cofyc @onlymellb PTAL