[Lease] Refactor lease renew request via raft #14094

ahrtr · 2022-06-07T02:08:46Z

Previously the lease renew request can only be processed by the leader. When a follower receives the renew request, it just forwards the request to the leader via an internal http channel. This isn't accurate because the leader may change during the process.

When a leader receives the renew request, the previous implementation follows a three stages workflow: pre-raft, raft and post-raft. It's too complicated and error prone, and the raft is more like just a network transport channel instead of a consensus mechanism in this case. Please also see issuecomment-975817268.

So in this PR, we process the renew request via the raft directly, it can greatly simplify the code.

The client facing API keeps unchanged, so it has no any impact on client applications, including Kubernetes.

ahrtr · 2022-06-07T02:20:02Z

cc @serathius @ptabor @spzala @mitake

codecov-commenter · 2022-06-07T02:38:25Z

Codecov Report

Merging #14094 (cb249cc) into main (4ce7a85) will decrease coverage by 0.04%.
The diff coverage is 90.19%.

@@            Coverage Diff             @@
##             main   #14094      +/-   ##
==========================================
- Coverage   75.08%   75.04%   -0.05%     
==========================================
  Files         452      452              
  Lines       36781    36826      +45     
==========================================
+ Hits        27618    27636      +18     
- Misses       7424     7447      +23     
- Partials     1739     1743       +4

Flag	Coverage Δ
all	`75.04% <90.19%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/config/config.go	`79.76% <ø> (ø)`
server/embed/config.go	`73.57% <ø> (ø)`
server/etcdserver/apply/corrupt.go	`17.64% <0.00%> (-2.36%)`	⬇️
server/lease/lessor.go	`87.81% <84.21%> (-0.99%)`	⬇️
server/embed/etcd.go	`75.09% <100.00%> (+0.19%)`	⬆️
server/etcdmain/config.go	`86.17% <100.00%> (+0.05%)`	⬆️
server/etcdserver/api/v3rpc/lease.go	`82.27% <100.00%> (ø)`
server/etcdserver/apply/apply.go	`82.64% <100.00%> (+0.24%)`	⬆️
server/etcdserver/apply/uber_applier.go	`91.48% <100.00%> (+0.18%)`	⬆️
server/etcdserver/v3_server.go	`75.76% <100.00%> (-2.25%)`	⬇️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ce7a85...cb249cc. Read the comment docs.

serathius · 2022-06-08T10:01:20Z

Discussion on lease issues and proposed fixes are so spread I find myself unable to comprehend it all.

@ahrtr can you help me understand how this change refers to #13915 and what is our overall plan. It would be great to have one umbrella issue that shows high level plan so we can make sure we are going in right direction.

Not sure I remember ever discussing moving Grant to raft as there were some performance concerns. With a lot of leases, grant requests can be done very frequently and they are latency sensitive making them much more vulnerable to network hiccup. Before we proceed with this PR would be good to dig up the original discussion and do some loadtesting. I haven't look to deep into this PR so I might be wrong about performance concerns.

server/etcdserver/api/etcdhttp/peer.go

ahrtr · 2022-06-08T14:49:09Z

This is a standalone refactor on Renew instead of Grant.

It took me sometime to go through some historical PRs ( 9924, 9526, 9699, 13508 ) before delivering this PR.

I have thought about three solutions (see below) to refactor lease, but eventually I realized that none of them are feasible.

persist expiryTime instead of remainingTTL into db, and it's similar to what 9526 did. And we don't need the Checkpoint functionality at all, each member, including leader and followers, calculate the expiryTime themselves. Since different nodes may have inconsistent wall time, so only the leader can check the expiry of each lease. The reason why each follower also calcuates the expiryTime is to prepare for the leader change case. Once a follower becomes a leader, it just uses its local expiryTime to continue to serve the leases.

This solution can greatly simplify the overall design and implementation, because we don't need the checkpoint functionality anymore. If a member's time jumps forwards or backwards drastically, then we will run into issue no matter which solution we follow. But this solution will be even worse in this case, because existing implementation will only be impacted when the leader's time jumps; but this solution might run into issue if any member's time jumps. Of course, each time revoking & renewing can fix & reset the time jumps.

Another downside of this solution is when the cluster is down for some time (such as maintenance window), then all leases might be expired when the cluster starts again. The existing implementation will not have this issue, because it persists the remainingTTL instead of expiryTime. The customer/client shouldn't pay for the server side issue. So persisting remainingTTL makes more sense then persisting expiryTime from this perspective.

Introduce a concept clusterClock, so that all members, including leader and followers, depend on the clusterClock instead of its local wal time. In this case, we don't need the checkpoint either. It can also greatly simplify the overall design & implementation. But it isn't that easy to implement a clusterClock. :(. Please refer to logcabin. Please also let me know if you have any proposals. :)
We can introduce NTP (network time protocol), but it will complicate the overall design. And it still can't guarantee the consistent of wall time between members .

Speaking to the overall plan, my thoughts are:

Keep the current implementation of Grant, Revoke, Checkpoint as they are for now. All of them are going through raft, it should be OK. Especially the checkpoint syncs remainingTTL to followers via raft, which is good to me. Please see comment (remainingTTL vs expiryTime) above on the first infeasible solution.
Refactor lease renew request via raft. This is exactly this PR fixes. Please read the description of this PR.
Modify LeaseLease/ListLease to make sure the member has applied the latest commitID. We just need to use linearizableReadNotify. I think the PR 13882 should be the right direction.
Regarding to the LeaseTimeToLive, let's keep it as it's for now, b ecause we can only depend on the leader's local wal time. The followers' local time may be inconsistency.

So in summary, we only need to refactor renew (this PR) and modify Listlease ( 13882 ). They are two separate & independent changes.

stale · 2022-09-20T21:41:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Executed: 1. ./scripts/genproto.sh 2. ./scripts/update_proto_annotations.sh Signed-off-by: Benjamin Wang <wachao@vmware.com>

Previously, the renew request can only be processed by the leader. If a follower receives the renew request, it just forwards the request to the leader via a internal http channel. This isn't accurate because the leader may change during the process. When a leader receives the renew request, the previous implementation follows a three stage workflow: pre-raft, raft and post-raft. It's too complicated and error prone, and the raft is more like just a network transport channel instead of a concensus mechanism in this case. So we process the renew request via raft directly, it can greatly simplify the code. Signed-off-by: Benjamin Wang <wachao@vmware.com>

…>= 3.6 Signed-off-by: Benjamin Wang <wachao@vmware.com>

mitake · 2023-05-10T14:02:20Z

server/etcdserver/apply/apply.go

@@ -207,6 +208,11 @@ func (a *applierV3backend) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevo
 	return &pb.LeaseRevokeResponse{Header: a.newHeader()}, err
 }

+func (a *applierV3backend) LeaseRenew(lc *pb.LeaseKeepAliveRequest) (*pb.LeaseKeepAliveResponse, error) {


I think it's still possible to apply LeaseRenew entries issued by a stale leader unless it provides a mechanism like comparing term #15247 (comment) ?
I think all Raft messages issued by etcd itself might have similar problems potentially.

I think a simpler approach is not using MsgProp and issuing lease related requests as MsgApp (related discussion: #15944 (comment)). With this approach we might be able to solve the issue without changing the WAL format. Its implementation will be tricky though. I'll try this idea this weekend if I can have time.

k8s-ci-robot · 2024-06-11T05:11:47Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-08-05T22:54:40Z

@ahrtr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-etcd-verify	`bafc656`	link	true	`/test pull-etcd-verify`
pull-etcd-unit-test-amd64	`bafc656`	link	true	`/test pull-etcd-unit-test-amd64`
pull-etcd-unit-test-arm64	`bafc656`	link	true	`/test pull-etcd-unit-test-arm64`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

serathius reviewed Jun 8, 2022

View reviewed changes

server/etcdserver/api/etcdhttp/peer.go Outdated Show resolved Hide resolved

ahrtr force-pushed the lease_renew_refactor_20220607 branch from 49dcdc9 to 96bf559 Compare June 10, 2022 02:49

ahrtr marked this pull request as draft June 10, 2022 02:56

ahrtr force-pushed the lease_renew_refactor_20220607 branch 2 times, most recently from eb441c1 to da0353a Compare June 10, 2022 08:11

ahrtr marked this pull request as ready for review June 10, 2022 08:14

ahrtr force-pushed the lease_renew_refactor_20220607 branch 4 times, most recently from 62e541c to 5bc4390 Compare June 14, 2022 11:29

ahrtr added this to the etcd-v3.6 milestone Jun 15, 2022

ahrtr added the stage/tracked label Sep 20, 2022

ptabor self-assigned this Oct 5, 2022

lease: add a lease_new request into raft_internal.proto

8ba1e26

Executed: 1. ./scripts/genproto.sh 2. ./scripts/update_proto_annotations.sh Signed-off-by: Benjamin Wang <wachao@vmware.com>

ahrtr force-pushed the lease_renew_refactor_20220607 branch 2 times, most recently from ac7d4b1 to 899a60a Compare March 7, 2023 07:29

ahrtr marked this pull request as draft March 7, 2023 07:44

ahrtr force-pushed the lease_renew_refactor_20220607 branch 2 times, most recently from eb63556 to c16bc84 Compare March 7, 2023 08:24

ahrtr marked this pull request as ready for review March 7, 2023 08:28

ahrtr assigned mitake Mar 7, 2023

ahrtr force-pushed the lease_renew_refactor_20220607 branch from c16bc84 to ba9d731 Compare March 7, 2023 08:52

ahrtr marked this pull request as draft March 7, 2023 09:11

ahrtr added 2 commits March 7, 2023 18:37

etcdserver: process lease Renew request via raft when clusterVersion …

bafc656

…>= 3.6 Signed-off-by: Benjamin Wang <wachao@vmware.com>

ahrtr force-pushed the lease_renew_refactor_20220607 branch from ba9d731 to bafc656 Compare March 7, 2023 10:38

chaochn47 mentioned this pull request Mar 17, 2023

Propose project roadmap #15499

Closed

mitake reviewed May 10, 2023

View reviewed changes

ahrtr mentioned this pull request Jun 15, 2023

lessor: fix le.itemMap leak after lease id revoked #16035

Closed

ahrtr mentioned this pull request Aug 21, 2023

Remove clock drift checks from connectivity monitor #16432

Open

k8s-ci-robot added the needs-rebase label Jun 11, 2024

ivanvc mentioned this pull request Jun 12, 2024

Renaming a presubmit job triggered builds on draft pull requests kubernetes-sigs/prow#190

Open

ahrtr removed this from the etcd-v3.6 milestone Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lease] Refactor lease renew request via raft #14094

[Lease] Refactor lease renew request via raft #14094

ahrtr commented Jun 7, 2022 •

edited

Loading

ahrtr commented Jun 7, 2022 •

edited

Loading

codecov-commenter commented Jun 7, 2022 •

edited

Loading

serathius commented Jun 8, 2022

ahrtr commented Jun 8, 2022

stale bot commented Sep 20, 2022

mitake May 10, 2023 •

edited

Loading

mitake May 30, 2023

k8s-ci-robot commented Jun 11, 2024

k8s-ci-robot commented Aug 5, 2024

[Lease] Refactor lease renew request via raft #14094

Are you sure you want to change the base?

[Lease] Refactor lease renew request via raft #14094

Conversation

ahrtr commented Jun 7, 2022 • edited Loading

ahrtr commented Jun 7, 2022 • edited Loading

codecov-commenter commented Jun 7, 2022 • edited Loading

Codecov Report

serathius commented Jun 8, 2022

ahrtr commented Jun 8, 2022

stale bot commented Sep 20, 2022

mitake May 10, 2023 • edited Loading

Choose a reason for hiding this comment

mitake May 30, 2023

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 11, 2024

k8s-ci-robot commented Aug 5, 2024

ahrtr commented Jun 7, 2022 •

edited

Loading

ahrtr commented Jun 7, 2022 •

edited

Loading

codecov-commenter commented Jun 7, 2022 •

edited

Loading

mitake May 10, 2023 •

edited

Loading