Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3: don't halt lease client if there is a lease error #7732

Merged
merged 8 commits into from
Apr 25, 2017

Conversation

heyitsanthony
Copy link
Contributor

Fixes #7488


// KeepAliveOnce renews the lease once. In most of the cases, Keepalive
// should be used instead of KeepAliveOnce.
KeepAliveOnce(ctx context.Context, id LeaseID) (*LeaseKeepAliveResponse, error)
KeepAliveOnce(ctx context.Context, id LeaseID) LeaseKeepAliveResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need hide the error from keepaliveonce?

// KeepAlive keeps the given lease alive forever. If the keepalive response posted to
// the channel is not consumed immediately, the lease client will continue sending keep alive requests
// to the etcd server at least every second until latest response is consumed.
KeepAlive(ctx context.Context, id LeaseID) LeaseKeepAliveChan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks the API. one dirty thing we can do is to leave the error there and always return nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, how should this break code that relies on the client: keep the interface but silently change the behavior or noticeably change the interface to match the new behavior?

Copy link
Contributor

@xiang90 xiang90 Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do people handle the error anyway today? probably with a simple retry the same as what we do internally now, right? if we stop returning the error, nothing is going to change for them. they can throw away the error handling code at their pace.

but, yea, we have to communicate this change. it just means that peoples' code can still compile and probably still work as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even we decide to break the api in this kind of minor way, it probably still not a huge deal either. but we need to communicate this very explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct here is the most explicit and obvious way to communicate the change is to break compilation.

Copy link
Contributor

@xiang90 xiang90 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct here is the most explicit and obvious way to communicate the change is to break compilation.

i am fine with this. but right now, client versioning is tight to the entire etcd versioning. so we cannot easily bump the major version of our client to indicate an API change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there docs on this? There's wire-compatibility breakage and there's source-compatibility breakage. The first is way more serious than the second. If the policy is to have it compile but the underlying behavior may change across minor revisions, I don't think that is a meaningful policy whatsoever. The code will statically pass via compilation, but then will suddenly break when running it. Sure the API is "preserved" but you're being wildly disingenuous about what's being guaranteed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going by semver policy, then this change will violate minor bump behavior no matter what and will need a compat layer instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there docs on this?

we do not. and, yes, we should.

If the policy is to have it compile but the underlying behavior may change across minor revisions, I don't think that is a meaningful policy whatsoever.

i agree.

If this is going by semver policy, then this change will violate minor bump behavior no matter what and will need a compat layer instead.

ok. probably the compat layer is the way to go for this release. we need to start thinking about breaking client and server versioning in the next release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so there's no official documentation or policy on it now so that's why there needs to be a compat layer which will gunk up the client even worse? The compat layer stuff won't fix the initial issue because the client will still be using the old interface. There's API-layer breakage in STM and Elections too for 3.2.

We've already done far worse in terms of semver adherence because k8s. Ugh.

@heyitsanthony heyitsanthony force-pushed the lease-err-ka branch 3 times, most recently from 1d45517 to 2c608e3 Compare April 18, 2017 18:15
@heyitsanthony heyitsanthony force-pushed the lease-err-ka branch 2 times, most recently from 71a4525 to 83a11fe Compare April 24, 2017 17:12
for l.stopCtx.Err() == nil {
err := l.recvKeepAliveLoop()
if err == context.Canceled {
err = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we do not propagate cancel error to ka.Close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simulates old behavior where a context cancelation will cause the lease channel to close without another message. Seems better to return the cancel error here instead of hiding it, though. Updated.

}()

for l.stopCtx.Err() == nil {
err := l.recvKeepAliveLoop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be back-off this retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, using a time.After unless something more complicated belongs here

// KeepAlive keeps the given lease alive forever. If the keepalive response posted to
// the channel is not consumed immediately, the lease client will continue sending keep alive requests
// to the etcd server at least every second until latest response is consumed.
KeepAlive(ctx context.Context, id LeaseID) LeaseKeepAliveChan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we talk about when do we close the chan? and what users should do when the returned chan is closed?


// KeepAliveOnce renews the lease once. In most of the cases, Keepalive
// should be used instead of KeepAliveOnce.
KeepAliveOnce(ctx context.Context, id LeaseID) (*LeaseKeepAliveResponse, error)
KeepAliveOnce(ctx context.Context, id LeaseID) LeaseKeepAliveResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. talk a little bit about the closing of the chan?

@xiang90
Copy link
Contributor

xiang90 commented Apr 24, 2017

LGTM

@heyitsanthony heyitsanthony force-pushed the lease-err-ka branch 4 times, most recently from b617cbf to 2fd6df9 Compare April 25, 2017 06:47
@heyitsanthony heyitsanthony merged commit fbbc4a4 into etcd-io:master Apr 25, 2017
@heyitsanthony heyitsanthony deleted the lease-err-ka branch April 25, 2017 14:06
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this pull request May 2, 2017
heyitsanthony pushed a commit that referenced this pull request May 3, 2017
Revert "Merge pull request #7732 from heyitsanthony/lease-err-ka"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants