-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add etcd and kube-apiserver faults #367
add etcd and kube-apiserver faults #367
Conversation
tests/cmd/stability/main.go
Outdated
@@ -214,5 +220,89 @@ func main() { | |||
oa.CheckTidbClusterStatusOrDie(cluster) | |||
} | |||
|
|||
// stop one etcd node and k8s/operator/tidbcluster is available | |||
faultEtcd := selectNode(conf.ETCDs) | |||
err := fta.StopETCD(faultEtcd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract it as StopAEtcdOrDie
tests/cmd/stability/main.go
Outdated
glog.Fatal(err) | ||
} | ||
defer fta.StartETCD(faultEtcd) | ||
err = tests.Keep(3*time.Second, 10*time.Minute, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
tests/util.go
Outdated
"k8s.io/client-go/rest" | ||
) | ||
|
||
func CreateKubeClient() (versioned.Interface, kubernetes.Interface, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use cli, kubeCli := client.NewCliOrDie()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tests/cmd/stability/main.go
Outdated
@@ -40,6 +42,7 @@ func main() { | |||
oa := tests.NewOperatorActions(cli, kubeCli, conf) | |||
fta := tests.NewFaultTriggerAction(cli, kubeCli, conf) | |||
fta.CheckAndRecoverEnvOrDie() | |||
oa.CheckK8sAvailable(nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a return error
@@ -219,5 +222,19 @@ func main() { | |||
// truncate a sst file and check failover | |||
oa.TruncateSSTFileThenCheckFailoverOrDie(cluster1, 5*time.Minute) | |||
|
|||
// stop one etcd node and k8s/operator/tidbcluster is available | |||
faultEtcd := tests.SelectNode(conf.ETCDs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this SelectNode
method will select the first etcd forever:
etcds: |
} | ||
|
||
func (oa *operatorActions) CheckK8sAvailable(excludeNodes map[string]*corev1.Node, excludePods map[string]*corev1.Pod) error { | ||
return wait.Poll(3*time.Second, time.Minute, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the default interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default interval is too long for the case
} | ||
|
||
func (oa *operatorActions) CheckOperatorAvailable(operatorConfig *OperatorConfig) error { | ||
return wait.Poll(3*time.Second, 3*time.Minute, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the default interval
} | ||
|
||
func (oa *operatorActions) CheckTidbClustersAvailable(infos []*TidbClusterConfig) error { | ||
return wait.Poll(3*time.Second, 30*time.Second, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the default interval
return true, nil | ||
} | ||
|
||
func GetPodStatus(pod *corev1.Pod) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use such a complicated function? just .status.phase
is not ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the .status.phase
just is the pod phase, but not pod's real state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these codes come from?
faultEtcd := tests.SelectNode(conf.ETCDs) | ||
fta.StopETCDOrDie(faultEtcd) | ||
defer fta.StartETCDOrDie(faultEtcd) | ||
oa.CheckOneEtcdDownOrDie(operatorCfg, allClusters, faultEtcd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add other cases: stopping 2 etcds and stopping 3 etcds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add these cases in next pr
// stop one etcd node and k8s/operator/tidbcluster is available | ||
faultEtcd := tests.SelectNode(conf.ETCDs) | ||
fta.StopETCDOrDie(faultEtcd) | ||
defer fta.StartETCDOrDie(faultEtcd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before CheckOneEtcdDownOrDie
you should sleep about 30 minutes.
/run-e2e-tests |
3 similar comments
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
2 similar comments
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
1 similar comment
/run-e2e-tests |
@weekface PTAL |
/run-e2e-tests |
this pr contains follow changes: