Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle etcd connection failures in etcd v3 watch API. #9

Merged
merged 2 commits into from
Jun 15, 2017

Conversation

adityadani
Copy link
Member

Add a Session to the Watch API.
The session can be used to detect etcd connection failures.
With the new clientv3, the watch is hung if there etcd connectivity is lost

@adityadani adityadani self-assigned this Jun 14, 2017
@@ -803,43 +807,63 @@ func (et *etcdKV) watchStart(
if waitIndex != 0 {
opts = append(opts, e.WithRev(int64(waitIndex+1)))
}
session, err := concurrency.NewSession(et.kvClient, concurrency.WithTTL(defaultSessionTimeout))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the following change, it may be better to create a cancel context ?
etcd-io/etcd#6699
Also, when it retries, the connection go to the next available etcd ? (kvClient needs to be refreshed ?)
otherwise looks good !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I had that cancel context, but cancelling it does not have any effect on the go-routine which is handling the watch responses (watchChan). Will add it back.

Should we refresh the kvClient in the watch api ?

The v2 docs say that the client goes to the next etcd - https://github.com/coreos/etcd/tree/master/client#caveat
But I could not find any doc or mention for clientv3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the client is fixed in v3.2.0
I am following up with etcd here - etcd-io/etcd#7941

And if that works then we might not need this change at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gr8. Let me know if you want me to follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a single node etcd server, if it goes down, the watch will still be hung and won't return. Looks like we will still need this change anyways. I am still running a test to see if v3.2.0 solves the reconnection issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, with this change in a single node etcd, session should terminate right ? (ignoring the reconnection issue ?)

@adityadani
Copy link
Member Author

adityadani commented Jun 15, 2017 via email

@sangleganesh
Copy link
Contributor

lets merge it then !

@adityadani adityadani merged commit e39da5b into master Jun 15, 2017
@adityadani adityadani deleted the etcd_v3_watch branch November 5, 2018 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants