Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: requested lease not found #9374

Closed
sh1ng opened this issue Feb 28, 2018 · 14 comments
Closed

etcdserver: requested lease not found #9374

sh1ng opened this issue Feb 28, 2018 · 14 comments
Labels

Comments

@sh1ng
Copy link

sh1ng commented Feb 28, 2018

Bug reporting

We use an etcd cluster of 3 members with version
quay.io/coreos/etcd:v3.2.0
running in k8s cluster. Not a heavy load, about 10k writes a day.

lease, err := client.Lease.Grant(myContext, int64(24 * time.Hour))
if err != nil {
	log.Error("Unable to get a lease from etcd", "error", err)
	return err
}
_, err = client.KV.Put(myContext, key, string(msg), clientv3.WithLease(lease.ID))
if err != nil {
	log.Error("Unable to put a record into etcd", "error", err) // <- it's logged here 
	return err
}

How that even possible? We see the error a few times a day.
What is the best way to deal with it?

@gyuho
Copy link
Contributor

gyuho commented Feb 28, 2018

Can you provide reproducible steps (locally)?

@sh1ng
Copy link
Author

sh1ng commented Feb 28, 2018

Not really, it's just a simple service that accepts records by grpc and store them in etcd for 24 hrs.

If you can give an idiomatic example for the scenario it would be nice.

@sh1ng
Copy link
Author

sh1ng commented Feb 28, 2018

And the main question how that could be possible that successfully created lease can't be found.

@gyuho
Copy link
Contributor

gyuho commented Feb 28, 2018

@sh1ng Do you have etcd server logs when this happened?

@heyitsanthony
Copy link
Contributor

@sh1ng etcd:v3.2.0? there was a restore bug fixed by 4526284; try upgrading to latest 3.2

@sh1ng
Copy link
Author

sh1ng commented Mar 5, 2018

Even after upgrade on quay.io/coreos/etcd:v3.2.16 we still see the same error.

Could it be possible that when parent context has been canceled etcd(or etcd client) returns revoked lease? We use streaming api to our service and a client might cancel it from time to time.

@gyuho
Copy link
Contributor

gyuho commented Mar 5, 2018

@sh1ng Do you server logs when this happened?

@yudai
Copy link
Contributor

yudai commented Mar 6, 2018

int64(24 * time.Hour) is 86400000000000 and I think what you want here is 60 * 60 * 24 = 86400, because Grant() expects a TTL in seconds.

When a lease is created with 86400000000000, it seems etcd loses the lease immediately. Probably it's just by an overflow: https://github.com/coreos/etcd/blob/master/lease/lessor.go#L599

We may want maxLeaseTTL to avoid unexpected behavior by overflows.

@gyuho
Copy link
Contributor

gyuho commented Mar 6, 2018

@yudai Got it right.

It gets overflow, and sets negative expiry time value, thus lease expires before that put request.

@yudai Do you want to send a fix?

@yudai
Copy link
Contributor

yudai commented Mar 7, 2018

@gyuho What value would you suggest for the max TTL? 10 years?
or, something like (Math.MaxInt64 / time.Second) - someBuffer ?

@gyuho
Copy link
Contributor

gyuho commented Mar 7, 2018

We usually lease promote with election timeout (which is 1 second by default), so (Math.MaxInt64 / time.Second) - time.Minute to be safe?

yudai pushed a commit to yudai/etcd that referenced this issue Mar 8, 2018
math.MaxInt64 / time.Second is 9,223,372,036. 9,000,000,000 is easier to
remember/document.

Closes etcd-io#9374.
@yudai
Copy link
Contributor

yudai commented Mar 8, 2018

@gyuho thanks for the suggestion.
I chose 9,000,000,000 to make it very safe (223,372,036 seconds buffer) and easier to document/remember.

yudai pushed a commit to yudai/etcd that referenced this issue Mar 8, 2018
math.MaxInt64 / time.Second is 9,223,372,036. 9,000,000,000 is easier to
remember/document.

Closes etcd-io#9374.
@gyuho
Copy link
Contributor

gyuho commented Mar 8, 2018

The fix will be released in 3.2 and 3.3.

@sh1ng
Copy link
Author

sh1ng commented Mar 12, 2018

Thanks guys!
Now it works like a charm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants