WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285

mitake · 2023-07-24T12:23:27Z

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

This PR tries to drop lease revoke requests triggered by a stale leader node. Some requests triggered by etcd itself (not issued by clients) can be applied by a cluster even if it's issued by a stale leader, and it might cause problems like #15247 The behavior is quite subtle and I'm not 100% sure that this PR can fix the problem. For verifying it, I think robustness test cases will be needed.

Note that this PR depends on the companion PR in the Raft library side: etcd-io/raft#88 It cannot be built without the change in the Raft side change. If reviewers are interested in, I'll update go.mod files for pointing the branch of my repository for easy testing.

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>

…l, will be removed later) Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>

chaochn47 · 2023-10-23T18:04:07Z

server/etcdserver/bootstrap.go

+		DisableProposalForwardingCallback: func(m raftpb.Message) bool {
+			for _, entry := range m.Entries {
+				var raftReq etcdserverpb.InternalRaftRequest
+				if !pbutil.MaybeUnmarshal(&raftReq, entry.Data) {
+					continue
+				}
+
+				if raftReq.LeaseRevoke != nil {
+					// If at least one of the entries has LeaseRevoke, the entire m will be discarded.
+					// TODO: LeaseCheckPoint should be discarded neither?
+					return true
+				}
+			}
+			return false
+		},


If the stale leader has not yet stepped down to follower (completely partitioned), then the proposal will be rejected due to higher term state on new leader, correct?

If the stale leader (etcdserver) already stepped down to follower, why etcd server even bother starts to propose lease revoke and we stop the proposal forwarding on raft layer? We could have implemented this on etcd server layer, correct?

On the other hand, this will be a breaking change as LeaseRevoke can also be initiated from user on demand. Not to mention raft has to de-serialize each entry in the message which might have negative performance impact as well.

I would propose to

have a fault injection e2e test in place to reproduce the issue first.

supply different failure scenarios and define the expected behavior with this fix.

Please let me know what I can help :)

Thanks a lot for checking this @chaochn47

If the stale leader has not yet stepped down to follower (completely partitioned), then the proposal will be rejected due to higher term state on new leader, correct?

Actually proposal (MsgProp) itself doesn't contain term information so it's not rejected by a new leader I think. I summarized this behavior in etcd-io/raft#88 (comment) and succeeding comments so it's great if you can cross check.

If the stale leader (etcdserver) already stepped down to follower, why etcd server even bother starts to propose lease revoke and we stop the proposal forwarding on raft layer? We could have implemented this on etcd server layer, correct?

I think having a condition check in etcd side can have a schedule like 1. etcd logic detect a node itself is a leader, but 2. when proposal happens its raft layer became a follower already. It's described in etcd-io/raft#73 so I'd like you to cross check.

On the other hand, this will be a breaking change as LeaseRevoke can also be initiated from user on demand. Not to mention raft has to de-serialize each entry in the message which might have negative performance impact as well.

This is true. We should have a mechanism of prioritizing leader if we really use this approach. The performance overhead is also introduced as you point out.

I tested this PR (and the corresponding raft side PR: etcd-io/raft#88) with the approach with the failpoint based approach: #15247 (comment)

(the commands in the comment needs to have additional for attaching a lease to k1 like etcdctl put k1 v1 --lease 32698b80fed16704)

With this PR I could check that lease revoke was dropped as expected. k1 exists even after a node which was injected sleep came back. Log messages are like this:

23:32:57 etcd2 | {"level":"info","ts":"2023-10-30T23:32:57.347423+0900","logger":"raft","caller":"v3@v3.0.0-20221201111702-eaa6808e1f7a/raft.go:1689","msg":"91bc3c398fb3c146 not forwarding to leader fd422379fda50e48 at term 3 because disableProposalForwardingCallback() returned true for the message; dropping proposal"} 23:32:57 etcd2 | {"level":"warn","ts":"2023-10-30T23:32:57.347554+0900","caller":"etcdserver/server.go:879","msg":"failed to revoke lease","lease-id":"32698b80fed16704","error":"raft proposal dropped"} 23:33:04 etcd2 | {"level":"info","ts":"2023-10-30T23:33:04.347552+0900","logger":"raft","caller":"v3@v3.0.0-20221201111702-eaa6808e1f7a/raft.go:1689","msg":"91bc3c398fb3c146 not forwarding to leader fd422379fda50e48 at term 3 because disableProposalForwardingCallback() returned true for the message; dropping proposal"} 23:33:04 etcd2 | {"level":"warn","ts":"2023-10-30T23:33:04.347695+0900","caller":"etcdserver/server.go:879","msg":"failed to revoke lease","lease-id":"32698b80fed16704","error":"raft proposal dropped"}

(Note that repeating this error itself should be avoided by fixing server.go)

On the main branch, k1 is removed by the lease revoke and the keepalive CLI behaved like this:

~/g/s/g/etcd [main]× » bin/etcdctl lease keep-alive 32698b8108524912 23:43:31 lease 32698b8108524912 keepalived with TTL(5) ... lease 32698b8108524912 keepalived with TTL(5) lease 32698b8108524912 expired or revoked.

Server side logs were like this:

23:44:15 etcd1 | {"level":"error","ts":"2023-10-30T23:44:15.844122+0900","caller":"mvcc/kvstore_txn.go:313","msg":"failed to detach old lease f rom a key","error":"lease not found","stacktrace":"go.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).delete\n\tgo.etcd.io/etcd/server/v3/ storage/mvcc/kvstore_txn.go:313\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).deleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/ kvstore_txn.go:276\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).DeleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/kvstore_txn.g o:171\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*metricsTxnWrite).DeleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/metrics_txn.go:46\ngo.etcd.io/etcd/server/v3/lease.(*lessor).Revoke\n\tgo.etcd.io/etcd/server/v3/lease/lessor.go:345\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*appl ierV3backend).LeaseRevoke\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:205\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*authApplierV3).LeaseRevoke\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply_auth.go:121\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*uberApplier).dispa tch\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/uber_applier.go:174\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*applierV3backend).Apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:156\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*authApplierV3).Apply\n\tgo.etcd.io/etcd/s erver/v3/etcdserver/apply/apply_auth.go:61\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*uberApplier).Apply\n\tgo.etcd.io/etcd/server/v3/etcdse rver/apply/uber_applier.go:117\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/ser ver.go:1929\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1826\ngo.etcd.io/etcd/s erver/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1131\ngo.etcd.io/etcd/server/v3/etcdserver.(*E tcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:919\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.e tcd.io/etcd/server/v3/etcdserver/server.go:842\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedu le.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:206\ngo.etcd.io/etcd /pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:187"}

It's a little bit odd because revoking happens twice so I'm still checking. Also, we need to turn parameters for lease TTL and sleep time for injecting for not causing rafthttp time out. I used 5 seconds for TTL and 7 seconds for sleeping and got the above result. I also updated DefaultConnReadTimeout and DefaultConnWriteTimeout (https://github.com/etcd-io/etcd/blob/main/server/etcdserver/api/rafthttp/peer.go#L40-L41) to 10 seconds.

Anyway, related behavior of etcd + raft library is quite subtle so it's really nice if you can verify the approach when you have time :)

k8s-ci-robot · 2024-01-16T20:20:53Z

@mitake: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-etcd-verify	`b58dfdb`	link	true	`/test pull-etcd-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2024-06-11T04:59:57Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mitake marked this pull request as draft July 24, 2023 12:23

mitake force-pushed the disable-proposal-forwarding-for-lease branch from dcbbb5c to f9a33a3 Compare July 24, 2023 12:23

mitake force-pushed the disable-proposal-forwarding-for-lease branch 2 times, most recently from a0cc43b to 137e94a Compare October 23, 2023 12:57

mitake added 3 commits October 23, 2023 21:57

point @mitake's raft fork (temporal, will be removed later)

3293e58

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>

Disable proposal forwarding for lease revoke requests

8fffd7d

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>

add dummy ForgetLeader to structs which implements raft.Node (tempora…

b58dfdb

…l, will be removed later) Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>

mitake force-pushed the disable-proposal-forwarding-for-lease branch from 137e94a to b58dfdb Compare October 23, 2023 12:57

chaochn47 reviewed Oct 23, 2023

View reviewed changes

mitake mentioned this pull request Nov 9, 2023

Ignore old leader's leases revoking request #16822

Merged

k8s-ci-robot added the needs-rebase label Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285

WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285

mitake commented Jul 24, 2023

chaochn47 Oct 23, 2023 •

edited

Loading

chaochn47 Oct 23, 2023

chaochn47 Oct 23, 2023

mitake Oct 30, 2023

mitake Oct 30, 2023

mitake Oct 30, 2023

k8s-ci-robot commented Jan 16, 2024

k8s-ci-robot commented Jun 11, 2024

WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285

Are you sure you want to change the base?

WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285

Conversation

mitake commented Jul 24, 2023

chaochn47 Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

chaochn47 Oct 23, 2023

Choose a reason for hiding this comment

chaochn47 Oct 23, 2023

Choose a reason for hiding this comment

mitake Oct 30, 2023

Choose a reason for hiding this comment

mitake Oct 30, 2023

Choose a reason for hiding this comment

mitake Oct 30, 2023

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 16, 2024

k8s-ci-robot commented Jun 11, 2024

chaochn47 Oct 23, 2023 •

edited

Loading