Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add etcd and kube-apiserver faults #367

Merged
merged 24 commits into from
Apr 30, 2019

Conversation

xiaojingchen
Copy link
Contributor

this pr contains follow changes:

  • add some checkers
    • CheckK8sAvailable
    • CheckOperatorAvailable
    • CheckTidbClustersAvailable
  • add etcd/apiserver stop faults

@@ -214,5 +220,89 @@ func main() {
oa.CheckTidbClusterStatusOrDie(cluster)
}

// stop one etcd node and k8s/operator/tidbcluster is available
faultEtcd := selectNode(conf.ETCDs)
err := fta.StopETCD(faultEtcd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract it as StopAEtcdOrDie

glog.Fatal(err)
}
defer fta.StartETCD(faultEtcd)
err = tests.Keep(3*time.Second, 10*time.Minute, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

tests/util.go Outdated
"k8s.io/client-go/rest"
)

func CreateKubeClient() (versioned.Interface, kubernetes.Interface, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use cli, kubeCli := client.NewCliOrDie()

@weekface weekface added the test/stability stability tests label Apr 11, 2019
zyguan
zyguan previously approved these changes Apr 12, 2019
Copy link
Contributor

@zyguan zyguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaojingchen xiaojingchen changed the title [WIP] add checker for faults add checker for faults Apr 19, 2019
@weekface weekface changed the title add checker for faults add etcd and kube-apiserver faults Apr 19, 2019
@@ -40,6 +42,7 @@ func main() {
oa := tests.NewOperatorActions(cli, kubeCli, conf)
fta := tests.NewFaultTriggerAction(cli, kubeCli, conf)
fta.CheckAndRecoverEnvOrDie()
oa.CheckK8sAvailable(nil, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a return error

@@ -219,5 +222,19 @@ func main() {
// truncate a sst file and check failover
oa.TruncateSSTFileThenCheckFailoverOrDie(cluster1, 5*time.Minute)

// stop one etcd node and k8s/operator/tidbcluster is available
faultEtcd := tests.SelectNode(conf.ETCDs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this SelectNode method will select the first etcd forever:

}

func (oa *operatorActions) CheckK8sAvailable(excludeNodes map[string]*corev1.Node, excludePods map[string]*corev1.Pod) error {
return wait.Poll(3*time.Second, time.Minute, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the default interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default interval is too long for the case

}

func (oa *operatorActions) CheckOperatorAvailable(operatorConfig *OperatorConfig) error {
return wait.Poll(3*time.Second, 3*time.Minute, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the default interval

}

func (oa *operatorActions) CheckTidbClustersAvailable(infos []*TidbClusterConfig) error {
return wait.Poll(3*time.Second, 30*time.Second, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the default interval

return true, nil
}

func GetPodStatus(pod *corev1.Pod) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use such a complicated function? just .status.phase is not ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the .status.phase just is the pod phase, but not pod's real state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these codes come from?

faultEtcd := tests.SelectNode(conf.ETCDs)
fta.StopETCDOrDie(faultEtcd)
defer fta.StartETCDOrDie(faultEtcd)
oa.CheckOneEtcdDownOrDie(operatorCfg, allClusters, faultEtcd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add other cases: stopping 2 etcds and stopping 3 etcds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add these cases in next pr

// stop one etcd node and k8s/operator/tidbcluster is available
faultEtcd := tests.SelectNode(conf.ETCDs)
fta.StopETCDOrDie(faultEtcd)
defer fta.StartETCDOrDie(faultEtcd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before CheckOneEtcdDownOrDie you should sleep about 30 minutes.

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

3 similar comments
@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

2 similar comments
@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

1 similar comment
@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

@weekface PTAL

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test/stability stability tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants