Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add maxFailoverCount limit to TiKV #965

Merged
merged 6 commits into from
Sep 30, 2019

Conversation

weekface
Copy link
Contributor

@weekface weekface commented Sep 28, 2019

What problem does this PR solve?

we have added this limit to TiDB in this PR: #163

What is changed and how does it work?

Check List

Tests

  • Unit test
  • E2E test
  • Stability test

Code changes

  • Has Helm charts change
  • Has Go code change
  • Has documents change

Side effects

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Does this PR introduce a user-facing change?:

add `maxFailoverCount` limit to TiKV

@xiaojingchen @tennix @cofyc @onlymellb PTAL

@weekface
Copy link
Contributor Author

/run-e2e-in-kind

@weekface weekface force-pushed the add-max-failover-count branch from 9b39e98 to 449260d Compare September 28, 2019 01:31
@weekface weekface requested a review from aylei September 28, 2019 01:33
@weekface
Copy link
Contributor Author

/run-e2e-in-kind

tennix
tennix previously approved these changes Sep 28, 2019
Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cofyc
cofyc previously approved these changes Sep 28, 2019
@weekface
Copy link
Contributor Author

/run-e2e-in-kind

tcName := tc.GetName()
if len(tc.Status.TiKV.FailureStores) >= int(tc.Spec.TiKV.MaxFailoverCount) {
glog.Errorf("%s/%s failure stores count reached the limit: %d", ns, tcName, tc.Spec.TiKV.MaxFailoverCount)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return is nil, so is it more appropriate to output log with warning level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, tidb_failover.go also needs to be modified.

@weekface weekface dismissed stale reviews from cofyc and tennix via d986559 September 29, 2019 02:13
@weekface
Copy link
Contributor Author

/run-e2e-in-kind

@weekface weekface added type/bug Something isn't working needs-cherry-pick-1.0 labels Sep 29, 2019
@@ -261,6 +261,7 @@ tikv:
# Specify the priorityClassName for TiKV Pod.
# refer to https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#how-to-use-priority-and-preemption
priorityClassName: ""
maxFailoverCount: 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add comments here to make it clear what's the detailed meaning and maybe how to configure the value, e.g. for the cluster with 100 TiKV, what's the suggested config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -30,6 +31,13 @@ func NewTiKVFailover(tikvFailoverPeriod time.Duration) Failover {
}

func (tf *tikvFailover) Failover(tc *v1alpha1.TidbCluster) error {
ns := tc.GetNamespace()
tcName := tc.GetName()
if len(tc.Status.TiKV.FailureStores) >= int(tc.Spec.TiKV.MaxFailoverCount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, If multiple tikv-servers become Down simultaneously, the MaxFailoverCount may be exceeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Addressed PTAL again.

glog.Warningf("%s/%s failure stores count reached the limit: %d", ns, tcName, tc.Spec.TiKV.MaxFailoverCount)
return nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tidb_failover.go also needs to be modified like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@weekface
Copy link
Contributor Author

/run-e2e-in-kind

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tennix
Copy link
Member

tennix commented Sep 30, 2019

/run-e2e-in-kind

@tennix tennix merged commit f91fc58 into pingcap:master Sep 30, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 30, 2019

cherry pick to release-1.0 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-1.0 type/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TiKV cluster not respecting CRD replica configuration add maxFailoverCount to TiKV
7 participants