clientv3: fix balancer and upgrade gRPC v1.7.x #8828

gyuho · 2017-11-07T09:35:45Z

clientv3: change balancer for gRPC v1.7.x

v1.6.0

cluster of endpoints A and B
pin A
updateNotifyLoop case upc == nil b.notifyCh <- A
A becomes blackholed
PUT request context times out on A
b.notifyCh <- [] in retry.go to drain connection A
grpc.tearDown(errConnDrain) on A
down(errConnDrain) on A
unpin A
pin B
updateNotifyLoop case upc == nil b.notifyCh <- B
following PUT (retry) request succeeds

v1.7.x

No more grpc.tearDown(errConnDrain), only subConnection.down(errConnClosing).
So timing has changed. Custom balancer needs to wait until errConnClosing to unpin an endpoint.

cluster of endpoints A and B
pin A
updateNotifyLoop case upc == nil b.notifyCh <- A
A becomes blackholed
PUT request context times out on A
b.notifyCh <- [] in retry.go to drain connection A
(ac *addrConn).tearDown(errConnDrain) in clientconn.go
handleSubConnStateChange(A,SHUTDOWN)
A connection state changes from READY to SHUTDOWN
updateNotifyLoop case upc == nil b.notifyCh <- A <=== need fix!
down(errConnClosing) on A
unpin A
pin B
lbWatcher RemoveSubConn(B) from step 10 <=== need fix!
updateNotifyLoop case upc == nil b.notifyCh <- B
handleSubConnStateChange(B,SHUTDOWN)
down(errConnClosing) on B
unpin B

So with gRPC v1.7.x, we are sending A while A gets blackholed after context timeout but before A gets unpinned. Wile A is pinned, if A is notified again, lbWatcher removes B sub-connection, when B is the next pinned endpoint.

Balancer needs fix by either

make notify logic aware of endpoint health status (not send unhealthy pinned endpoint)
fix racey notify logic between gRPC and balancer Up's down function
make drain operation synchronous (when sending [] to notify channel)

This gRPC change has been failing TestBalancerUnderBlackholeNoKeepAlive*.

The text was updated successfully, but these errors were encountered:

xiang90 · 2017-11-07T16:40:25Z

it seems like a bug at gRPC side or our analysis has an issue.

basically after step 15 in 1.7x, step 16, 17, 18 is a side effect of the previous notification of a. but b should eventually be up again at least.

so what will happen after 18?

gyuho · 2017-11-07T17:32:51Z

@xiang90

so what will happen after 18?

It starts over by notifying A and B, and pins blackholed endpoint A.

it seems like a bug at gRPC side or our analysis has an issue.

With gRPC v1.6.0, balancer unpins on tearDown(errConnDrain), where it drains blackholed endpoint right after marking as unhealthy from context errors.

With gRPC v1.7.2, balancer waits until state change is propagated to the balancer, and calls down(errConnClosing). So there's timing difference grpc/grpc-go#1649 (comment). The current balancer implementation does not unpin blackholed endpoint until errConnClosing. So it's possible that

blackholed A is pinned
select case upc == nil:
PUT times out
c.balancer.next()
in select case upc == nil:, select case msg := <-b.updateAddrsC
b.notifyAddrs(notifyNext)
b.notifyCh <- []
b.notifyCh <- A,B
again, select case upc == nil: since A is still pinned
b.notifyCh <- A
lbWatcher RemoveSubConn(B) while A is blackholed, and B is the one we want to pin

I will investigate more.

xiang90 · 2017-11-07T19:31:00Z

eventually a should be unpined due to drain, then b should be pined. we shoud figure out why this is not happening instead of "fixing" the race.

gyuho · 2017-11-07T20:18:11Z

eventually a should be unpined due to drain

Correct, A gets unpinned.

then b should be pined

Correct, B get pinned after A gets unpinned.

Problem is, afterwards, B gets unpinned when we notify blackholed A before A is unpinned. This removes subconnection B. The subconnection removal is happening after B gets unpinned, because there's no synchronization between gRPC lbWatcher and our updateNotifyLoop. So, B gets pinned but unpinned right away.

xiang90 · 2017-11-07T20:34:08Z

So, B gets pinned but unpinned right away.

will it be pined again?

gyuho · 2017-11-07T20:35:50Z

So, B gets pinned but unpinned right away.

will it be pined again?

Yes, but not always. Balancer could pin A instead, while it's blackholed (because both A and B are marked unhealthy, so choose whichever up first). And test times out.

xiang90 · 2017-11-07T20:36:52Z

Balancer could pin A instead, while it's blackholed (because both A and B are marked unhealthy, so choose whichever up first). And test times out.

but then b should be pined again, and things should get stabilized, right?

gyuho · 2017-11-07T20:41:13Z

but then b should be pined again

It was pinning A instead and TestBalancerUnderBlackholeNoKeepAlive* was failing because of that--failed because endpoint switch didn't happen within timeout.

Since this is after B gets unpinned (so marked as unhealthy), there's no guarantee that B will be re-pinned within the timeout.

xiang90 · 2017-11-07T20:42:34Z

Since this is after B gets unpinned (so marked as unhealthy), there's no guarantee that B will be re-pinned within the timeout.

i am more curious if b will be eventually pined regardless of the timeout. if it is the case, then all we need to fight against is the timing issue.

gyuho · 2017-11-07T20:55:13Z

b will be eventually pined regardless of the timeout

I believe we cannot guarantee that, because both A and B are unhealthy for the same reason--errConnClosing, from balancer's viewpoint. We notify both A and B in notifyAddrs to lbWatcher. Since there's no prioritization between A and B, in theory, it is possible that

notify A, B
Up A
pin A
A times out
notify []
notify A, B
notify A, since A is still pinned
unpin A
Up B
pin B
tearDown B from step 7
unpin B
repeat

xiang90 · 2017-11-07T21:24:03Z

@gyuho OK. Let us stop notifying gRPC the endpoints that are in the unhealthy list. It should solve this issue, and it is inline with what we are going to do in the future too. sounds reasonable?

gyuho · 2017-11-09T17:44:36Z

@xiang90

Here's the problem when notifying only healthy addresses (even with gRPC v1.6.x):

cluster of A, B, C, where A is the leader
balancer with A, B, C
notify A, B, C (pinned=="")
balancer pins A
stop B(follower)
stop A(leader)
balancer marks A as unhealthy
balancer unpins A
notify B, C (pinned=="") <=== problem!
pin C
linearizable get on C
fails due to "etcdserver: request timed out" (A, B stopped)
balancer marks C as unhealthy
restart A (now A, C are live, B is stopped)
notify [] to drain all current connections (from step 13)
balancer marks C as unhealthy
unpin C
notify B (since A, C are marked as unhealthy) <=== problem!
linearizable get times out

Problem with step 9 is B was never pinned, so balancer thinks it's heathy. Or even B was marked unhealthy, it might have been removed from unhealthy lists after a few seconds.

Problem with step 16 is it's still notifying B, rather than A, C (since A and C are still marked as unhealthy).

xiang90 · 2017-11-09T18:01:48Z

@gyuho this is actually ok. retry l-get again or enable keepalive should fix it.

gyuho · 2017-11-09T18:41:26Z

@xiang90

I believe even retry wouldn't help. In the case above, balancer is stuck in step 18 (when notifying B). Even after A and C get removed from unhealthy, it is still stuck waiting for a new endpoint up.

https://github.com/coreos/etcd/blob/05e5b3b62da78d9fddfb89bf852586049b4bae76/clientv3/balancer.go#L228-L236

And not sure how keepalive would help, when there's no endpoint to ping.

xiang90 · 2017-11-09T18:46:36Z

@gyuho well, maybe you accidentally remove the notify loop at https://github.com/coreos/etcd/blob/master/clientv3/health_balancer.go#L153? this is very important, so that when A,C is back, A, C will be sent to gRPC.

gyuho · 2017-11-09T19:09:15Z

@xiang90 Good catch!

Indeed, my patch was handling empty pinned address wrong, in that code path. Now test passes. Will run more tests and push a PR.

gyuho added this to the v3.3.0 milestone Nov 7, 2017

gyuho mentioned this issue Nov 7, 2017

*: upgrade gRPC v1.7.x with balancer connection drain fix #8831

Closed

This was referenced Nov 7, 2017

clientv3: remove "healthBalancer" #8834

Closed

*: refactor clientv3 balancer, upgrade gRPC to v1.7.2 #8840

Merged

gyuho closed this as completed in #8840 Nov 10, 2017

gyuho mentioned this issue Dec 27, 2017

Test clientv3 balancer under network partitions, other failures #8711

Closed

12 tasks

gyuho mentioned this issue Jan 6, 2018

Rewrite clientv3 health balancer #9106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clientv3: fix balancer and upgrade gRPC v1.7.x #8828

clientv3: fix balancer and upgrade gRPC v1.7.x #8828

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017 •

edited by gyuho

Loading

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 9, 2017 •

edited

Loading

xiang90 commented Nov 9, 2017

gyuho commented Nov 9, 2017

xiang90 commented Nov 9, 2017

gyuho commented Nov 9, 2017

clientv3: fix balancer and upgrade gRPC v1.7.x #8828

clientv3: fix balancer and upgrade gRPC v1.7.x #8828

Comments

gyuho commented Nov 7, 2017 • edited Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 • edited Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 • edited Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 • edited Loading

xiang90 commented Nov 7, 2017 • edited by gyuho Loading

gyuho commented Nov 7, 2017 • edited Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 7, 2017 • edited Loading

xiang90 commented Nov 7, 2017

gyuho commented Nov 9, 2017 • edited Loading

xiang90 commented Nov 9, 2017

gyuho commented Nov 9, 2017

xiang90 commented Nov 9, 2017

gyuho commented Nov 9, 2017

gyuho commented Nov 7, 2017 •

edited

Loading

gyuho commented Nov 7, 2017 •

edited

Loading

gyuho commented Nov 7, 2017 •

edited

Loading

gyuho commented Nov 7, 2017 •

edited

Loading

xiang90 commented Nov 7, 2017 •

edited by gyuho

Loading

gyuho commented Nov 7, 2017 •

edited

Loading

gyuho commented Nov 7, 2017 •

edited

Loading

gyuho commented Nov 9, 2017 •

edited

Loading