ctlv2: report unhealthy in cluster-health if any node is unavailable #8070

heyitsanthony · 2017-06-08T23:07:52Z

xiang90 · 2017-06-09T05:08:40Z

lgtm

xiang90 · 2017-06-09T05:09:14Z

etcdctl/ctlv2/command/cluster_health.go

 			fmt.Println("cluster is healthy")
-		} else {
+		case 0:
+			fmt.Println("cluster is unavailable")


degraded is more descriptive?

Fixes etcd-io#8061 and etcd-io#7032

xiang90 · 2017-06-09T18:30:51Z

lgtm

codecov-io · 2017-06-09T23:56:46Z

Codecov Report

Merging #8070 into master will decrease coverage by 0.02%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #8070      +/-   ##
==========================================
- Coverage   76.51%   76.49%   -0.03%     
==========================================
  Files         342      342              
  Lines       26548    26549       +1     
==========================================
- Hits        20314    20309       -5     
- Misses       4774     4785      +11     
+ Partials     1460     1455       -5

Impacted Files	Coverage Δ
etcdctl/ctlv2/command/cluster_health.go	`11.84% <0%> (-0.16%)`	⬇️
auth/simple_token.go	`86.79% <0%> (-6.61%)`	⬇️
clientv3/namespace/watch.go	`93.93% <0%> (-6.07%)`	⬇️
proxy/grpcproxy/watch.go	`89.58% <0%> (-3.48%)`	⬇️
etcdmain/etcd.go	`45.49% <0%> (-0.86%)`	⬇️
clientv3/watch.go	`95.5% <0%> (-0.5%)`	⬇️
etcdserver/server.go	`80.39% <0%> (-0.35%)`	⬇️
etcdserver/v3_server.go	`71.73% <0%> (-0.24%)`	⬇️
pkg/adt/interval_tree.go	`88.71% <0%> (+0.3%)`	⬆️
etcdserver/api/v3rpc/watch.go	`93.42% <0%> (+0.93%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b194276...3fcb833. Read the comment docs.

discordianfish · 2018-05-22T10:55:37Z

Was this somehow reverted in a recent change? I depend on this behavior here: https://github.com/itskoko/kubecfn/blob/master/kubernetes.yaml#L1049

And cluster-health currently returns 0 even if there are unhealthy/unreachable members:

May 22 10:21:08 ip-172-20-181-238 etcd-signal-health[1894]: member f96dcaf66e42e5 is healthy: got healthy result from https://ip-172-20-182-16.ec2.internal:2379
May 22 10:21:08 ip-172-20-181-238 etcd-signal-health[1894]: member 3117c78e122ac7cb is healthy: got healthy result from https://ip-172-20-184-45.ec2.internal:2379
May 22 10:21:09 ip-172-20-181-238 etcd-signal-health[1894]: failed to check the health of member 96a7254e2e1542e1 on https://ip-172-20-180-128.ec2.internal:2379: Get https://ip-172-20-180-128.ec2.internal:2379/health: dial tcp 172.20.180.128:2379: getsockopt: no r
May 22 10:21:09 ip-172-20-181-238 etcd-signal-health[1894]: member 96a7254e2e1542e1 is unreachable: [https://ip-172-20-180-128.ec2.internal:2379] are all unreachable
May 22 10:21:09 ip-172-20-181-238 etcd-signal-health[1894]: member a2d7a63532026c28 is healthy: got healthy result from https://ip-172-20-181-238.ec2.internal:2379
May 22 10:21:09 ip-172-20-181-238 etcd-signal-health[1894]: cluster is healthy

"Container Linux by CoreOS 1688.5.3 (Rhyolite)"
etcdctl version: 3.2.15
API version: 2

According to etcd-io/etcd#8070, cluster-health should check this but apparently this broke in some recent releases, so we're checking this explicitly in the script now.

gyuho · 2018-05-22T18:28:42Z

@discordianfish I see this change is only available from v3.3.0 release https://github.com/coreos/etcd/blob/v3.3.0/etcdctl/ctlv2/command/cluster_health.go? Can you try v3.3?

xiang90 reviewed Jun 9, 2017

View reviewed changes

ctlv2: report unhealthy in cluster-health if any node is unavailable

ad0b3cf

Fixes etcd-io#8061 and etcd-io#7032

heyitsanthony force-pushed the etcdctl-cluster-health branch from 79260a0 to ad0b3cf Compare June 9, 2017 17:57

e2e: update cluster-health test for new etcdctl output

3fcb833

heyitsanthony force-pushed the etcdctl-cluster-health branch from 36ec217 to 3fcb833 Compare June 9, 2017 20:55

heyitsanthony merged commit 933aa09 into etcd-io:master Jun 9, 2017

heyitsanthony deleted the etcdctl-cluster-health branch June 9, 2017 21:57

heyitsanthony mentioned this pull request Jun 9, 2017

cluster-health reports healthy with contradictory output #7032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ctlv2: report unhealthy in cluster-health if any node is unavailable #8070

ctlv2: report unhealthy in cluster-health if any node is unavailable #8070

heyitsanthony commented Jun 8, 2017

xiang90 commented Jun 9, 2017

xiang90 Jun 9, 2017

xiang90 commented Jun 9, 2017

codecov-io commented Jun 9, 2017

discordianfish commented May 22, 2018

gyuho commented May 22, 2018

ctlv2: report unhealthy in cluster-health if any node is unavailable #8070

ctlv2: report unhealthy in cluster-health if any node is unavailable #8070

Conversation

heyitsanthony commented Jun 8, 2017

xiang90 commented Jun 9, 2017

xiang90 Jun 9, 2017

Choose a reason for hiding this comment

xiang90 commented Jun 9, 2017

codecov-io commented Jun 9, 2017

Codecov Report

discordianfish commented May 22, 2018

gyuho commented May 22, 2018