Fix for etcd client oneshot cluster member cycling #16307

rrati · 2017-09-12T17:33:54Z

…er cycling

rrati · 2017-09-12T17:35:27Z

deads2k · 2017-09-12T18:12:50Z

@rrati the pick was clean?

rrati · 2017-09-12T18:20:29Z

@deads2k More or less. The changes to the _test file aren't included because this version of etcd didn't have the tests. The changes to the client code applied cleanly.

deads2k · 2017-09-12T18:22:00Z

lgtm

Do you have an issue or bug to tie to this?

rrati · 2017-09-12T18:23:13Z

A BZ, but no github issue

liggitt · 2017-09-12T18:23:35Z

remind me how we're managing picks for non-kube repos? also, an issue to track bumping to a release that includes this would be good (and an issue for kube, which is impacted the same way)

liggitt · 2017-09-14T14:08:46Z

/lgtm

openshift-merge-robot · 2017-09-14T14:08:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, rrati

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~OWNERS~~ [liggitt]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

openshift-merge-robot · 2017-09-15T11:43:26Z

Automatic merge from submit-queue

sttts · 2018-03-12T14:10:36Z

Do we have a backport of this into origin master? The upstream PR only merged into etcd 3.3+.

liggitt · 2018-03-12T14:16:23Z

it was backported into etcd 3.2.8 in etcd-io/etcd@15e9510 and picked up in 3.7+ in 6032f62#diff-1c6011bace39f9ad159c10fa9a674a3a

mfojtik · 2018-03-12T14:17:13Z

@liggitt how about 3.2.16 we have in 3.9 ?

liggitt · 2018-03-12T14:22:07Z

yes, >= 3.2.8

verified this fix is in master:

origin/vendor/github.com/coreos/etcd/client/client.go

Lines 375 to 398 in 1e041c8

    
           } else if resp.StatusCode/100 == 5 { 
        
           	switch resp.StatusCode { 
        
           	case http.StatusInternalServerError, http.StatusServiceUnavailable: 
        
           		// TODO: make sure this is a no leader response 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s has no leader", eps[k].String())) 
        
           	default: 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s returns server error [%s]", eps[k].String(), http.StatusText(resp.StatusCode))) 
        
           	} 
        
           	err = cerr.Errors[0] 
        
           } 
        
           if err != nil { 
        
           	if !isOneShot { 
        
           		continue 
        
           	} 
        
           	c.Lock() 
        
           	c.pinned = (k + 1) % leps 
        
           	c.Unlock() 
        
           	return nil, nil, err 
        
           } 
        
           if k != pinned { 
        
           	c.Lock() 
        
           	c.pinned = k 
        
           	c.Unlock() 
        
           }

3.9:

origin/vendor/github.com/coreos/etcd/client/client.go

Lines 375 to 398 in 6d21b7d

    
           } else if resp.StatusCode/100 == 5 { 
        
           	switch resp.StatusCode { 
        
           	case http.StatusInternalServerError, http.StatusServiceUnavailable: 
        
           		// TODO: make sure this is a no leader response 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s has no leader", eps[k].String())) 
        
           	default: 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s returns server error [%s]", eps[k].String(), http.StatusText(resp.StatusCode))) 
        
           	} 
        
           	err = cerr.Errors[0] 
        
           } 
        
           if err != nil { 
        
           	if !isOneShot { 
        
           		continue 
        
           	} 
        
           	c.Lock() 
        
           	c.pinned = (k + 1) % leps 
        
           	c.Unlock() 
        
           	return nil, nil, err 
        
           } 
        
           if k != pinned { 
        
           	c.Lock() 
        
           	c.pinned = k 
        
           	c.Unlock() 
        
           }

3.8:

origin/vendor/github.com/coreos/etcd/client/client.go

Lines 375 to 398 in ff72b3f

    
           } else if resp.StatusCode/100 == 5 { 
        
           	switch resp.StatusCode { 
        
           	case http.StatusInternalServerError, http.StatusServiceUnavailable: 
        
           		// TODO: make sure this is a no leader response 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s has no leader", eps[k].String())) 
        
           	default: 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s returns server error [%s]", eps[k].String(), http.StatusText(resp.StatusCode))) 
        
           	} 
        
           	err = cerr.Errors[0] 
        
           } 
        
           if err != nil { 
        
           	if !isOneShot { 
        
           		continue 
        
           	} 
        
           	c.Lock() 
        
           	c.pinned = (k + 1) % leps 
        
           	c.Unlock() 
        
           	return nil, nil, err 
        
           } 
        
           if k != pinned { 
        
           	c.Lock() 
        
           	c.pinned = k 
        
           	c.Unlock() 
        
           }

3.7:

origin/vendor/github.com/coreos/etcd/client/client.go

Lines 375 to 398 in a8deba5

    
           } else if resp.StatusCode/100 == 5 { 
        
           	switch resp.StatusCode { 
        
           	case http.StatusInternalServerError, http.StatusServiceUnavailable: 
        
           		// TODO: make sure this is a no leader response 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s has no leader", eps[k].String())) 
        
           	default: 
        
           		cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s returns server error [%s]", eps[k].String(), http.StatusText(resp.StatusCode))) 
        
           	} 
        
           	err = cerr.Errors[0] 
        
           } 
        
           if err != nil { 
        
           	if !isOneShot { 
        
           		continue 
        
           	} 
        
           	c.Lock() 
        
           	c.pinned = (k + 1) % leps 
        
           	c.Unlock() 
        
           	return nil, nil, err 
        
           } 
        
           if k != pinned { 
        
           	c.Lock() 
        
           	c.pinned = k 
        
           	c.Unlock() 
        
           }

sttts · 2018-03-12T14:24:31Z

@liggitt without any deeper knowledge of the client code: all your references point to the etcd2 client, not the clientv3 package.

liggitt · 2018-03-12T14:37:43Z

hmm, true. I wonder if there was a similar issue with the etcd v3 client

sttts · 2018-03-12T15:12:13Z

But even in v3, there is a 10 second context passed to the dial func. How can that block for 15 minutes? (@mfojtik noticed that)

sttts · 2018-03-12T15:25:19Z

With persistent grpc connections we certainly do not talk about a Dial call here.

UPSTREAM: coreos/etcd: 8519: Fix for etcd client oneshot cluster memb…

52d57ee

…er cycling

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 12, 2017

openshift-merge-robot assigned deads2k and liggitt Sep 12, 2017

rrati changed the title ~~UPSTREAM: coreos/etcd: 8519: Fix for etcd client oneshot cluster member cycling~~ Fix for etcd client oneshot cluster member cycling Sep 12, 2017

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2017

openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 14, 2017

openshift-merge-robot merged commit b3b8ade into openshift:release-3.6 Sep 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for etcd client oneshot cluster member cycling #16307

Fix for etcd client oneshot cluster member cycling #16307

rrati commented Sep 12, 2017

rrati commented Sep 12, 2017

deads2k commented Sep 12, 2017

rrati commented Sep 12, 2017

deads2k commented Sep 12, 2017

rrati commented Sep 12, 2017 •

edited

Loading

liggitt commented Sep 12, 2017

liggitt commented Sep 14, 2017

openshift-merge-robot commented Sep 14, 2017

openshift-merge-robot commented Sep 15, 2017

sttts commented Mar 12, 2018

liggitt commented Mar 12, 2018

mfojtik commented Mar 12, 2018

liggitt commented Mar 12, 2018

sttts commented Mar 12, 2018

liggitt commented Mar 12, 2018 •

edited

Loading

sttts commented Mar 12, 2018

sttts commented Mar 12, 2018

Fix for etcd client oneshot cluster member cycling #16307

Fix for etcd client oneshot cluster member cycling #16307

Conversation

rrati commented Sep 12, 2017

rrati commented Sep 12, 2017

deads2k commented Sep 12, 2017

rrati commented Sep 12, 2017

deads2k commented Sep 12, 2017

rrati commented Sep 12, 2017 • edited Loading

liggitt commented Sep 12, 2017

liggitt commented Sep 14, 2017

openshift-merge-robot commented Sep 14, 2017

openshift-merge-robot commented Sep 15, 2017

sttts commented Mar 12, 2018

liggitt commented Mar 12, 2018

mfojtik commented Mar 12, 2018

liggitt commented Mar 12, 2018

sttts commented Mar 12, 2018

liggitt commented Mar 12, 2018 • edited Loading

sttts commented Mar 12, 2018

sttts commented Mar 12, 2018

rrati commented Sep 12, 2017 •

edited

Loading

liggitt commented Mar 12, 2018 •

edited

Loading