Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make leases linearizable (excluding TTL) #13915

Open
serathius opened this issue Apr 9, 2022 · 9 comments
Open

Make leases linearizable (excluding TTL) #13915

serathius opened this issue Apr 9, 2022 · 9 comments

Comments

@serathius
Copy link
Member

As etcd leases depend on real time, they cannot provide same guarantees as rest of the API. In distributed systems we cannot depend on local time of single member, so etcd mitigates this by relying on cluster leader to decide what it lease TTL and whether it expired.

There is however there are two major issues related to current implementation of leases in etcd:

  • Members proxy requests to leader, which cannot be reliably done outside of raft. Raft provides guarantee about replication of the log, however is not expected to provide accurate information about current leader.
  • API returning lease doesn't exist based on real time (TTL <= 0), before it's removal is negotiated by raft. This leads to situation where leader returns that lease doesn't exist even though it's in a middle refreshing it. for example concurrent.Session closed immediately after creation #13675.

First issue means that methods that read/update lease TTL, are not guaranteed to provide accurate results/succeed. However, fixing them will require larger redesign of leases. We should revisit this in future releases.

Second issue means that leases might be reported as deleted, even though keys using them will still exist and they can be stil renewed. For v3.5 we should look into fixing this issue.

Proposal: Whether lease exist should depend on raft, making it linearizable.
With this change we should be able to address #13675, and improve reliability of lease API.

Changes:

  • Leases should not be treated as deleted when TTL hits 0.
  • Listing leases (LeaseLeases) method should be linealizable.

This change brings one issue that we should be discussed. Clients might not be correctly handling case where lease has ttl=0. For example client could be using ttl as a delay to wait for lease to be revoked. We should double check client code to verify if this change will have an impact and consider having some minimal TTL that is returned (for example 1 second).

cc @ptabor @ahrtr @endocrimes @spzala

@ahrtr
Copy link
Member

ahrtr commented Apr 9, 2022

I think we may have a short-term plan and a long-term plan.

Short-term plan:
The PR pull/13690 follows the same way as previous one , it should can resolve the issue, at least greatly mitigate it. Regarding to ListLease, current implementation is serializable request, we can keep it as it's for now and address it in 3.6, because it's an enhancement.

So in short the short-term plan is merge pull/13690 into main and backport to 3.5.

Long-term plan:
We need some refactor on the current design/implementation. Followers do not need to forward requests to leader any more. Instead, all communication should be through the raft. @ptabor Do you know the background why previously the lease request (such as renewal) do not go through raft?

The long-term plan should only be performed on 3.6 or even 3.7.

@serathius
Copy link
Member Author

I don't agree that #13690 fixes the issue. It adds a wait time for apply on leader, however this makes the situation described in this issue even worse. It increases the window/time after leader check, meaning that there is a greater chance that leader has changed. Best short term solution would be to stop treating ttl=0 as lease not existing.

@ahrtr
Copy link
Member

ahrtr commented Apr 11, 2022

Best short term solution would be to stop treating ttl=0 as lease not existing.

I might be wrong, but "stop treating ttl=0 as lease not existing" seems not correct to me. Currently Etcdserver intentionally returns a response with ttl=0 when LEASE NOT FOUND; and the client side closes the stream session when it receives a response with ttl=0. The workflow/logic looks good for now until we have a better idea on how to refactor the overall design.

I don't agree that #13690 fixes the issue. It adds a wait time for apply on leader, however this makes the situation described in this issue even worse. It increases the window/time after leader check, meaning that there is a greater chance that leader has changed

I think the above comment is only partially valid.

Firstly, usually waitAppliedIndex returns immediately (which means there is no wait time at all) because normally the leader finishes each applying workflow before it receives the following/next gRPC request, i.e. LeaseRenew.

Secondly, even in some extreme scenarios when Etcdserver hasn't finished the previous applying workflow before it receives the following gRPC request, the leader may change during the waitAppliedIndex . But it shouldn't be a problem, because Etcdserver will forward the request to the new leader. See lessor.go#L396 and v3_server.go#L321-L335.

Please note that all above discussion are based on the assumption there is no big refactor for now. So it's for the short-term plan.

With regard to the long-term plan, I am thinking my previous comment "Followers do not need to forward requests to leader any more. Instead, all communication should be through the raft" might not be correct. I guess the reason why previously the LeaseRenew and LeaseTimeToLive do not go through raft is that the raft is based on a global logic clock while lease is based on the wall clock, there is a gap between them. It would be better if more maintainers can feedback here. Since it's just a long-term plan, so we are no rush to make a decision right now.

@ahrtr
Copy link
Member

ahrtr commented May 9, 2022

FYI discussion_r868499893

@ahrtr
Copy link
Member

ahrtr commented May 13, 2022

@serathius Are you working on this? If not, I'd like to work on it. I can start to work on it about 1+ week later.

We need to get all lease related issue & enhancement sorted out, and provide a clear plan.

@endocrimes
Copy link
Contributor

I have a WIP branch for this but didn't get much time in the last couple of weeks (I've been mostly out with a shoulder injury for the last week or so).

The hardest bit to figure out conceptually has been figuring out how to handle time and clock drift - specifically assuming we want to switch away from having checkpoint decrement the TTL in Raft and towards having some kind of expiresAt field and computing TTL in the API.

@serathius
Copy link
Member Author

specifically assuming we want to switch away from having checkpoint decrement the TTL in Raft and towards having some kind of expiresAt field and computing TTL in the API.

Not sure we want this. Please let's discuss the design and agree on who should implement it to avoid duplicated and wasted work.

@ahrtr
Copy link
Member

ahrtr commented May 13, 2022

Thanks @serathius and @endocrimes for the feedback.

Let's discuss the design & plan firstly, and balance the effort afterwards, I will try to drive this. Feel free to jump in if anyone has any thoughts or ideas.

@stale
Copy link

stale bot commented Oct 15, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants