Network-partition aware health service #8673

gyuho · 2017-10-10T00:17:11Z

clientv3 health balancer should be able to reason about network-partitions with keepalive HTTP/2 ping.

#8660 makes balancer aware of network partitions on time-out errors.
But only handles the case when client specifies context time-out.

We can do better.

Client sends linearized requests with context time-out x
- or no time-out with context.Background
Client configures keepalive HTTP/2 ping time-out y, where y < x
Balancer pins endpoint A in 3-node cluster
Member A becomes isolated
Linearized request to A blocks until time-out x
- blocks forever if requested with context.Background

When y < x, keepalive pings should detect that member A cannot reach other members.
Then trigger endpoint switch before time-out x elapse.

The text was updated successfully, but these errors were encountered:

gyuho · 2017-10-17T17:11:55Z

/cc @jpbetz

jpbetz · 2017-10-17T22:37:07Z

/cc @wojtek-t

gyuho · 2018-05-02T22:07:51Z

We still plan to do this (maybe in v3.5).

Merging with #8022, since both server and client need reason about network partitions. Current HTTP/2 ping client and server do not reason about network partition.

There are cases where the client needs to know that the server is responding to requests. If the server is not responding (but possibly still accepting connections), the balancer should try another endpoint. For instance, a watch may be disconnected but the connection will stay open and hang instead of connecting to another member.

This heartbeat mechanism is somewhat weaker in that it doesn't need a leader, but there still needs to be some kind of polling to avoid waiting indefinitely on a disconnected socket. The client implementations for heartbeat/lost leader reconnect can both poll or lost leader can depend on heartbeat to cover the wait case.

ref. #7321

gyuho · 2018-06-15T19:18:50Z

Another use case kubernetes/kubernetes#59848 (comment)

This can happen on a singleton master using an HA etcd cluster:

apiserver is talking to member 0 in the cluster
apiserver crashes and watch cache performs its initial list at RV=10
kubelet deletes pod at RV=11
a new pod is created at RV=12 on a different node
kubelet crashes
apiserver becomes partitioned from the etcd majority but is able to reestablish a watch from the partitioned member at RV=10
kubelet contacts the apiserver and sees RV=10, launches the pod

gyuho · 2018-06-20T17:38:11Z

And just to clarify, currently require leader metadata must be passed to prevent this:

wch :=  cli.Watch(clientv3.WithRequireLeader(context.Background()), "foo")
// will be closed when the node loses leader

wch :=  cli.Watch(context.Background(), "foo")
// block forever, even if a node loses leader

stale · 2020-04-07T07:11:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale · 2020-09-06T01:32:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

xiang90 added the area/clientv3 label Oct 10, 2017

xiang90 added this to the v3.4.0 milestone Oct 10, 2017

gyuho changed the title ~~clientv3 network-partiton-aware keepalive HTTP/2 ping~~ clientv3 network-partition-aware keepalive HTTP/2 ping Oct 17, 2017

gyuho mentioned this issue Jan 6, 2018

Rewrite clientv3 health balancer #9106

Closed

gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Mar 3, 2018

gyuho added the type/feature label May 2, 2018

gyuho changed the title ~~clientv3 network-partition-aware keepalive HTTP/2 ping~~ Network-partition aware health service May 2, 2018

gyuho mentioned this issue May 2, 2018

api: grpc health checking service #8022

Closed

gyuho self-assigned this Jun 15, 2018

This was referenced Jun 20, 2018

clientv3: clarify "WithRequireLeader" for network partition #9872

Merged

Provide custom keepalive ping handler grpc/grpc-go#2166

Closed

This was referenced Mar 25, 2020

Clarification on watch when server (or store) is partitioned kubernetes/apiserver#57

Closed

Clarification on watch when server (or store) is partitioned kubernetes/kubernetes#89488

Closed

stale bot added the stale label Apr 7, 2020

stale bot closed this as completed Apr 28, 2020

spzala reopened this Jun 8, 2020

stale bot removed the stale label Jun 8, 2020

stale bot added the stale label Sep 6, 2020

stale bot closed this as completed Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network-partition aware health service #8673

Network-partition aware health service #8673

gyuho commented Oct 10, 2017 •

edited

Loading

gyuho commented Oct 17, 2017

jpbetz commented Oct 17, 2017

gyuho commented May 2, 2018

gyuho commented Jun 15, 2018

gyuho commented Jun 20, 2018

stale bot commented Apr 7, 2020

stale bot commented Sep 6, 2020

Network-partition aware health service #8673

Network-partition aware health service #8673

Comments

gyuho commented Oct 10, 2017 • edited Loading

gyuho commented Oct 17, 2017

jpbetz commented Oct 17, 2017

gyuho commented May 2, 2018

gyuho commented Jun 15, 2018

gyuho commented Jun 20, 2018

stale bot commented Apr 7, 2020

stale bot commented Sep 6, 2020

gyuho commented Oct 10, 2017 •

edited

Loading