Lease checkpoints fix #13491

michaljasionowski · 2021-11-19T16:28:31Z

This PR does 2 things:

fixes a bug that stops checkpoint scheduling after first leader change
extends checkpointing to server restarts (original implementation was targeting only leader changes)

Lease checkpointing is a mechanism (currently turned off by default) that is used to prevent lease TTLs from being reset to initial value after each leader change.
Currently, checkpoints are scheduled only until the first leader change. After any subsequent leader change TTL is set to the value from before the first leader change. To fix this, checkpoint scheduling has been forced for all leases after the leader change.

After fixing the first bug, lease checkpointing still stops working after a cluster restart, or any situation when server state has to be restored from the raft log. This is fixed by forcing etcd server to apply all LeaseCheckpoint requests it processes, which in the case of server restart means also LeaseCheckpoint requests that were already applied before the restart.
Another possible solution to this problem would be to store scheduled checkpoints in the KV store.

Integration tests have been improved to cover both issues.

Current checkpointing mechanism is buggy. New checkpoints for any lease are scheduled only until the first leader change. Added fix for that and a test that will check it.

To extend lease checkpointing mechanism to cases when the whole etcd cluster is restarted. If etcd server has to restore its state from the raft logs, all LeaseCheckpoint requests will be applied to the server, regardles of the index value. This will set remaining TTLs to values from before the restart. Otherwise, remaining TTLs would be reset to initial TTLs after each cluster restart. Added integration test to cover this case.

…nto checkpoints-fix

serathius · 2021-11-22T09:47:38Z

cc @ptabor @jpbetz

serathius · 2021-11-22T09:59:57Z

cc @hexfusion

serathius · 2021-11-22T10:39:15Z

Your PR includes 3 merge commits that unnecessary complicate git commit history, can you clean them up? (Happy to help you don't know how).

serathius · 2021-11-22T12:37:14Z

After fixing the first bug, lease checkpointing still stops working after a cluster restart, or any situation when server state has to be restored from the raft log. This is fixed by forcing etcd server to apply all LeaseCheckpoint requests it processes, which in the case of server restart means also LeaseCheckpoint requests that were already applied before the restart.
Another possible solution to this problem would be to store scheduled checkpoints in the KV store.

One problem I see is inconsistent behavior depending on how much raft log is replayed. Checkpoints only impact state stored in memory and are not persistent. This means that the end ttl will depend on how much log is replayed. Etcd raft log is replayed from last snapshot, which is triggered every 10`000 entries (by default, can be changed), so whether checkpoints (done every 5 minutes) are available since last checkpoint will depend on how many proposals per second are handled by cluster.

Change with forcing V3 apply for Checkpoint doesn't really solve the problem. ApplyBoth here only works makes etcd consider raft logs from last snapshot (happens every 10000 entries) instead of last commit (every 5 seconds). I bring this up as we are planning to remove v2 store and we will no longer need to replay raft from last snapshot.

Can you describe scenarios where replaying from raft log is needed? What failure scenarios we want to handle and in which ones replaying raft log is needed.

serathius · 2021-11-22T13:34:39Z

server/lease/lessor.go

@@ -446,6 +446,7 @@ func (le *lessor) Promote(extend time.Duration) {
 		l.refresh(extend)
 		item := &LeaseWithTime{id: l.ID, time: l.expiry}
 		le.leaseExpiredNotifier.RegisterOrUpdate(item)
+		le.scheduleCheckpointIfNeeded(l)


Found that commenting out this line doesn't break the tests. Can you add a test for this?

Yes. I wonder why `le.scheduleCheckpointIfNeeded(l)' in line 487 is not handling this case.

serathius · 2021-11-22T13:54:15Z

server/etcdserver/server.go

@@ -1901,6 +1901,11 @@ func (s *EtcdServer) applyEntryNormal(e *raftpb.Entry) {
 		s.w.Trigger(r.ID, s.applyV2Request((*RequestV2)(rp), shouldApplyV3))
 		return
 	}
+	if !shouldApplyV3 && raftReq.LeaseCheckpoint != nil {


This makes checkpoints depend whether there was a checkpoint since last snapshot. I think we should consider persisting remainingTTL from checkpoints into backend.

ptabor · 2021-11-22T18:46:05Z

I find the current layout scary. A single object (lesser) that has parts of state:

updated pre-raft
updated post-raft but in-memory (Expiration TTL). Raft is really a 'networking mechanism', not part of the consensus.
updated post-raft in V3 backend (existence of Lease object written by Grant:

etcd/server/lease/lessor.go

Line 817 in 8f17652

lpb := leasepb.Lease{ID: int64(l.ID), TTL: l.ttl, RemainingTTL: l.remainingTTL}

)

And no protection that the post-raft code does not depend on pre-raft code, so risking introducing indeterminism.

IMHO we should split the lesser into explicit 2-3 objects, where 2. & 3. can be only mutated by post-raft, and they do not depend on 1 in any way. The same way 3. cannot depend on 2. (as 2. is currently indeterministic).

Looking at the @jpbetz desire (#9924), the goal was to minimize number of rafts + writes performed during renewal. Without persisting ExpirationTTL as part of the state, we land with fuzzy definition of 2.,
i.e. either ExpirationTTL is correct or it's 0 (meaning unknown, assume full TTL) and this state can depend between consensus members.

If I understand Marek Proposal, it would be actually merging 2&3:

additional cost to update Expiration TTL in bbolt should be actually not that significant
we would not persist leases in storeV2 snapshot (as it does not contains leases at all currently)
there would no guarantee (as currently) that ExpirationTTL for given lease is set... The guarantee would be that if its set to not 0, it's bigger than the remaining time.
This seems good to me, but we should check @jpbetz perspective.
I think we should split leaser into 2 objects (pre & post raft)
It might be a good exercise to write a document (public markdown) that documents how leasing work with respect to raft and guarantees.

serathius · 2021-11-26T14:15:25Z

Closing for #13508

michaljasionowski added 5 commits November 19, 2021 14:52

lease,integration: add checkpoint scheduling after leader change

f5a0aa3

Current checkpointing mechanism is buggy. New checkpoints for any lease are scheduled only until the first leader change. Added fix for that and a test that will check it.

Merge branch 'main' into checkpoints-fix

796b344

Merge branch 'main' into checkpoints-fix

c5eb78e

Merge branch 'checkpoints-fix' of github.com:michaljasionowski/etcd i…

3307d0d

…nto checkpoints-fix

serathius mentioned this pull request Nov 22, 2021

Why ETCD new leader will restart the TTL counting when leader swithed #13294

Closed

serathius self-requested a review November 22, 2021 09:47

serathius reviewed Nov 22, 2021

View reviewed changes

serathius added this to the etcd-v3.6 milestone Nov 25, 2021

serathius mentioned this pull request Nov 26, 2021

Lease Checkpoints fix #13508

Merged

serathius closed this Nov 26, 2021

serathius mentioned this pull request Dec 8, 2021

lease: Persist remainingTTL to prevent indefinite auto-renewal of long lived leases #9924

Merged

ahrtr mentioned this pull request Jun 7, 2022

[Lease] Refactor lease renew request via raft #14094

Draft

ahrtr removed this from the etcd-v3.6 milestone Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lease checkpoints fix #13491

Lease checkpoints fix #13491

michaljasionowski commented Nov 19, 2021

serathius commented Nov 22, 2021

serathius commented Nov 22, 2021

serathius commented Nov 22, 2021

serathius commented Nov 22, 2021 •

edited

Loading

serathius Nov 22, 2021

ptabor Nov 22, 2021

serathius Nov 22, 2021 •

edited

Loading

ptabor commented Nov 22, 2021

serathius commented Nov 26, 2021

Lease checkpoints fix #13491

Lease checkpoints fix #13491

Conversation

michaljasionowski commented Nov 19, 2021

serathius commented Nov 22, 2021

serathius commented Nov 22, 2021

serathius commented Nov 22, 2021

serathius commented Nov 22, 2021 • edited Loading

serathius Nov 22, 2021

Choose a reason for hiding this comment

ptabor Nov 22, 2021

Choose a reason for hiding this comment

serathius Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

ptabor commented Nov 22, 2021

serathius commented Nov 26, 2021

serathius commented Nov 22, 2021 •

edited

Loading

serathius Nov 22, 2021 •

edited

Loading