Remove clock drift checks from connectivity monitor #16432

serathius · 2023-08-17T08:21:12Z

What would you like to be added?

I would like to propose removal of logs prober found high clock drift as they are incorrectly implying that etcd is impacted by clock drift.

To my knowledge, there is no impact of clock difference on etcd version 3.

Raft itself doesn't not depend on time in any way. It measures time passage for things like health probes, but doesn't compare time between members.
The only part of etcd that could be impacted is Leases, see #9768 (comment).

The connectivity monitor that reports the time drift was introduced for v2 etcd (#3210).
In v3 etcd leases were rewritten to depend on time difference, thus should not be affected. #3834
Leases also use monotonic time (#6888, #8507) meaning time changes should not impact ttl.

I expect the connectivity monitor stayed due to etcd v3.5 still officially supporting v2 API.
Next release v3.6 removes v2 API, so we can remove the clock drift detection too.

Please let me know if you are aware of any place that etcd could be impacted by clock drift.

Why is this needed?

Prevents user confusion about clock drift impact on etcd.

The text was updated successfully, but these errors were encountered:

Aditya-Sood · 2023-08-20T11:20:05Z

hi @serathius
would you be working on this issue or is it open for contribution?

serathius · 2023-08-21T07:54:26Z

Opening for discussion. @ptabor @ahrtr @jmhbnz

ccing everyone involved with the lease work @xiang90 @gyuho @hexfusion @yichengq @jonboulle @yichengq @heyitsanthony

ahrtr · 2023-08-21T08:46:59Z

The lease depends on wall time, if a member's local clock drift, then the lease will be affected. Simply put, all usage of time.Now() and time.After/time.Before may get invalid results in that situation when clock drift. So suggest to keep the warning ("prober found high clock drift") as it's for now.

The existing lease has big design issue, we need to think about how to refactor it, which has already been tracked by the roadmap.

All leases are revoked when the etcd leader is stuck in handling raft Ready due to slow fdatasync or high CPU. #15247
[Lease] Refactor lease renew request via raft #14094

serathius · 2023-08-21T13:56:30Z

The lease depends on wall time, if a member's local clock drift, then the lease will be affected.

I don't think impact of clock drift on wall time matters. By impact on lease I mean consistency issue. Situations where clock drift could cause one member to consider lease expired while other member doesn't.

To clarify by clock drift a small error in physical clock that accumulates over a time. It might be error of 1 second per month that over a year accumulates to couple of minutes. This is negligible for lease wall time calculation as leases are meant to be short, Kubenetes lease for events is unusual because it's for 2h. For 2h lease the wall time impact of clock drift should negligible (below 1 second) which is acceptable.

Clock drift between members would matter if TTL was calculated in exact deadline, not by time difference. For example if we have lease with TTL 1h, it doesn't matter if one member calculates time from 18:00:00 to 19:00:00 while other member is 10 second forward and calculates 18:00:10 to 19:00:10.

xiang90 · 2023-09-07T18:57:00Z

@serathius

Two things:

Yes. Clock drifts between different nodes should not affect etcd v3. We designed it that way and that is why all the leases get renewed/extended automatically once there is a leader switch. We can do some handshake and renew with a shorter period to make the lease more accurate even under a voluntary leader switch.
Keeping the clock in sync is still good... It makes debugging 100 times simpler. In any environment with NTP enabled, we should expect a less than 500ms clock difference. So warning on that is probably still OK..... I would not suggest removing it. But really up to the current maintainers since it is not a correctness issue.

serathius · 2023-09-08T08:33:15Z

Thanks @xiang90 for confirming there is no correctness issue. I can agree that having clock drift in cluster is not great, however this is an external problem to etcd. Many assumptions break with clock drift, take centralized logging, it becomes useless for debugging if logs are collected from nodes with clock drift.

I don't think this is inherently an etcd problem. I had multiple users scared asking me what's the impact on etcd. Will it not perform well? Is the whole cluster consistency is under risk? Overall I think it's not a good idea to try to solve issues that users don't have. Etcd is not a monitoring system, so it should not monitor or alert on clock drift. It just confuses user about why etcd cares about clock drift.

We can warn users about clock drift making etcd debugging hard. We can recommend that users run NTP and monitor their clock drift, even provide a link to external tools, but etcd should not take this responsibility on itself.

Aditya-Sood · 2023-09-08T09:04:26Z

hi @serathius, @xiang90
since the log is already of WARN type, I suppose there is no contribution expected for this issue?

tjungblu · 2023-09-08T09:07:00Z

I'm also +1 for removing it. WDYT about adding a timestamp to endpoint/status? That way we can still have some indication when we go through (disconnected) customer logs.

serathius · 2023-09-08T09:07:38Z

@Aditya-Sood We haven't made decision on how to proceed yet.

serathius · 2023-09-08T09:09:37Z

WDYT about adding a timestamp to endpoint/status? That way we can still have some indication when we go through (disconnected) customer logs.

Don't think it will help. Clock drift at the moment of request can be totally unrelated to clock drift present at the moment of logs were written.

serathius · 2023-09-08T09:11:43Z

As confirmed in #16432 (comment) I would like to repeat the proposal of to remove the prober found high clock drift log. I think it causes confusion and doesn't help with debugging (more detail in #16432 (comment))

ping @ahrtr @jmhbnz @wenjiaswe

jmhbnz · 2023-09-08T09:20:14Z

Hey Team - I've been following this thread and held off commenting as I'm still not fully familiar with the underlying code in question. However to answer the ping above, my vote would be to continue to inform users that clock drift over a certain threshold exists, but ensure this is done in such a way that it is clear there is no impact on etcd consistency.

ahrtr · 2023-09-08T09:36:33Z

Situations where clock drift could cause one member to consider lease expired while other member doesn't.

Isn't this a problem caused by clock drift? Especially when the problematic member is a leader.

Overall, I don't think this ticket deserve much time to discuss before the issue I mentioned in #16432 (comment) is resolved, especially #15247

serathius · 2023-09-08T09:53:07Z

Situations where clock drift could cause one member to consider lease expired while other member doesn't.

Isn't this a problem caused by clock drift? Especially when the problematic member is a leader.

No, because:

Decision that lease expires is done by leader and executed via quorum.
Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.

ahrtr · 2023-09-08T12:43:31Z

Decision that lease expires is done by leader and executed via quorum.

FYI. It's being executed by quorum instead of being agreed/consensused by quorum.

Also per your logic, the issue #15247 should NOT happen, because the out of date leader (which gets stuck on writing for a long time) will never get the consensus.

Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.

Pls do not assume any user cases.

serathius · 2023-09-08T13:17:53Z

FYI. It's being executed by quorum instead of being agreed/consensused by quorum.

What I meant here is that the decision that lease should be invalidated is made by leader and then proposed to raft.

Also per your logic, the issue #15247 should NOT happen, because the out of date leader (which gets stuck on writing for a long time) will never get the consensus.

No, that's not true. My understanding: (please correct me if any of those points is incorrect)

Lease grant request is sent, creation is negotiated via quorum and applied to all members backend.
1a. If member is leader it also start clock to count down lease TTL.
1b. Read request (ListLeases, Lease) about lease TTL will be forwarded to leader and leader will return time from the countdown clock it started.
If leader changes,
2a. old leader will stop the lease countdown clock
2b. new leader will start clock again with original TTL. Notice that TTL is reset. (Assuming that optional feature of lease checkpointing is disabled).
Lease TTL passes, clock started by leader will prompt them to send a LeaseRevoke request.
3a. Read requests about lease TTL will return 1 second until lease is properly revoked.
Lease Revoke is executed by quorum and applied to all member backends.
4a. Read request about lease will return that lease doesn't exist.

Issue #15247 is caused by point 2a not properly executed. Old leader doesn't know that it should step down and countdown clock is ticking. This results in old leader executing point 3 even though it shouldn't be. I have pointed out those issues in #15944 long time ago.

Clock drift doesn't influence any of those steps, as cluster only depends on countdown clock on leader. If leader is changed, the TTL is reset. If leader clock is 10 seconds behind other members, it doesn't matter it the time difference it will count will still be TTL.

Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.

Pls do not assume any user cases.

We need to make assumptions, especially about leases which were designed for short living leader election tokens. 2 hour leases are already a big problem for Kubernetes due to lack of checkpointing.

Imagine that you generate 1 GB of Kubernetes events within an hour and you have an hour of TTL. Every time there is a leader change (can easily happened multiple times in hour), the TTL will be reset. One leader election you have 2GB, two leader changes, you have 3 GB and so on.

serathius added type/feature type/cleanup and removed type/feature labels Aug 17, 2023

serathius mentioned this issue Sep 26, 2023

All leases are revoked when the etcd leader is stuck in handling raft Ready due to slow fdatasync or high CPU. #15247

Closed

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove clock drift checks from connectivity monitor #16432

Remove clock drift checks from connectivity monitor #16432

serathius commented Aug 17, 2023 •

edited

Loading

Aditya-Sood commented Aug 20, 2023

serathius commented Aug 21, 2023

ahrtr commented Aug 21, 2023

serathius commented Aug 21, 2023

xiang90 commented Sep 7, 2023

serathius commented Sep 8, 2023 •

edited

Loading

Aditya-Sood commented Sep 8, 2023

tjungblu commented Sep 8, 2023

serathius commented Sep 8, 2023

serathius commented Sep 8, 2023 •

edited

Loading

serathius commented Sep 8, 2023 •

edited

Loading

jmhbnz commented Sep 8, 2023

ahrtr commented Sep 8, 2023

serathius commented Sep 8, 2023

ahrtr commented Sep 8, 2023

serathius commented Sep 8, 2023 •

edited

Loading

This comment was marked as off-topic.

Remove clock drift checks from connectivity monitor #16432

Remove clock drift checks from connectivity monitor #16432

Comments

serathius commented Aug 17, 2023 • edited Loading

What would you like to be added?

Why is this needed?

Aditya-Sood commented Aug 20, 2023

serathius commented Aug 21, 2023

ahrtr commented Aug 21, 2023

serathius commented Aug 21, 2023

xiang90 commented Sep 7, 2023

serathius commented Sep 8, 2023 • edited Loading

Aditya-Sood commented Sep 8, 2023

tjungblu commented Sep 8, 2023

serathius commented Sep 8, 2023

serathius commented Sep 8, 2023 • edited Loading

serathius commented Sep 8, 2023 • edited Loading

jmhbnz commented Sep 8, 2023

ahrtr commented Sep 8, 2023

serathius commented Sep 8, 2023

ahrtr commented Sep 8, 2023

serathius commented Sep 8, 2023 • edited Loading

This comment was marked as off-topic.

serathius commented Aug 17, 2023 •

edited

Loading

serathius commented Sep 8, 2023 •

edited

Loading

serathius commented Sep 8, 2023 •

edited

Loading

serathius commented Sep 8, 2023 •

edited

Loading

serathius commented Sep 8, 2023 •

edited

Loading