Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lease expiration check #19092

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

upamanyus
Copy link

Fixes #19091.

This PR adds and uses a client-side Unexpired() method that determines if a lease is valid by checking if its expiration time is after the current time. This helps fix a bug in the client leasing library, and provides a general API to check if a lease is valid, which was not previously exposed to users.

This adds a new test for leasing which does a few Puts and Gets with delays and with a client that gets network partitioned. The test fails often on the old leasing implementation. The bug within leasing was here:

select {
case <-lkv.session.Done():
default:
return true
}

This code tries to determine if a session is "ready" by checking if an non-blocking receive on lkv.session.Done() fails. However, failing to immediately receive on a chan does not guarantee anything: the other side's send or close may just have been delayed by a bit.

Fortunately for reproducing the problem, there's a delay already present in the code: the Done() channel is closed by a background loop that sleeps and periodically checks whether the lease is expired. If the lease expires while this loop is sleeping, there is a delay between the true expiration time and when the Done() channel is closed, which seems to make the test fail relatively reliably (it still sometimes takes a few tries to get a failure).

Using Unexpired(), the test always appears to pass.

I couldn't find other places using etcd leases with the problematic pattern from the leasing library, but I'm admittedly not sure how the Lease library is used by others.

The new test checks if `leasingKV.Get()` correctly checks
for lease expiration by checking if it (incorrectly) returns stale
values when network partitioned from the rest of the system.

Signed-off-by: Upamanyu Sharma <upamanyu@mit.edu>
This helps fix the failing test TestLeasingGetChecksForExpiration.
Previously, the `leasing` library relied on *not* receiving something
over the `session.Done()` channel in `readySession()`. Failing to
immediately receive over the channel does not guarantee that the lease
is actually still valid.

Signed-off-by: Upamanyu Sharma <upamanyu@mit.edu>
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: upamanyus
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @upamanyus. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -136,6 +136,10 @@ type Lease interface {
// (see https://github.com/etcd-io/etcd/pull/7866)
KeepAlive(ctx context.Context, id LeaseID) (<-chan *LeaseKeepAliveResponse, error)

// Unexpired returns true iff the lease is unexpired (more precisely: iff the
// lease was unexpired during the execution of the Unexpired() call).
Unexpired(id LeaseID) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid negative logic in the interface. !Expired(id) should be simpler.

Signed-off-by: Upamanyu Sharma <upamanyu@mit.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Client lease library does not check for expiration
3 participants