Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove clock drift checks from connectivity monitor #16432

Open
serathius opened this issue Aug 17, 2023 · 17 comments
Open

Remove clock drift checks from connectivity monitor #16432

serathius opened this issue Aug 17, 2023 · 17 comments

Comments

@serathius
Copy link
Member

serathius commented Aug 17, 2023

What would you like to be added?

I would like to propose removal of logs prober found high clock drift as they are incorrectly implying that etcd is impacted by clock drift.

To my knowledge, there is no impact of clock difference on etcd version 3.

Raft itself doesn't not depend on time in any way. It measures time passage for things like health probes, but doesn't compare time between members.
The only part of etcd that could be impacted is Leases, see #9768 (comment).

The connectivity monitor that reports the time drift was introduced for v2 etcd (#3210).
In v3 etcd leases were rewritten to depend on time difference, thus should not be affected. #3834
Leases also use monotonic time (#6888, #8507) meaning time changes should not impact ttl.

I expect the connectivity monitor stayed due to etcd v3.5 still officially supporting v2 API.
Next release v3.6 removes v2 API, so we can remove the clock drift detection too.

Please let me know if you are aware of any place that etcd could be impacted by clock drift.

Why is this needed?

Prevents user confusion about clock drift impact on etcd.

@Aditya-Sood
Copy link

hi @serathius
would you be working on this issue or is it open for contribution?

@serathius
Copy link
Member Author

Opening for discussion. @ptabor @ahrtr @jmhbnz

ccing everyone involved with the lease work @xiang90 @gyuho @hexfusion @yichengq @jonboulle @yichengq @heyitsanthony

@ahrtr
Copy link
Member

ahrtr commented Aug 21, 2023

The lease depends on wall time, if a member's local clock drift, then the lease will be affected. Simply put, all usage of time.Now() and time.After/time.Before may get invalid results in that situation when clock drift. So suggest to keep the warning ("prober found high clock drift") as it's for now.

The existing lease has big design issue, we need to think about how to refactor it, which has already been tracked by the roadmap.

@serathius
Copy link
Member Author

The lease depends on wall time, if a member's local clock drift, then the lease will be affected.

I don't think impact of clock drift on wall time matters. By impact on lease I mean consistency issue. Situations where clock drift could cause one member to consider lease expired while other member doesn't.

To clarify by clock drift a small error in physical clock that accumulates over a time. It might be error of 1 second per month that over a year accumulates to couple of minutes. This is negligible for lease wall time calculation as leases are meant to be short, Kubenetes lease for events is unusual because it's for 2h. For 2h lease the wall time impact of clock drift should negligible (below 1 second) which is acceptable.

Clock drift between members would matter if TTL was calculated in exact deadline, not by time difference. For example if we have lease with TTL 1h, it doesn't matter if one member calculates time from 18:00:00 to 19:00:00 while other member is 10 second forward and calculates 18:00:10 to 19:00:10.

@xiang90
Copy link
Contributor

xiang90 commented Sep 7, 2023

@serathius

Two things:

  1. Yes. Clock drifts between different nodes should not affect etcd v3. We designed it that way and that is why all the leases get renewed/extended automatically once there is a leader switch. We can do some handshake and renew with a shorter period to make the lease more accurate even under a voluntary leader switch.

  2. Keeping the clock in sync is still good... It makes debugging 100 times simpler. In any environment with NTP enabled, we should expect a less than 500ms clock difference. So warning on that is probably still OK..... I would not suggest removing it. But really up to the current maintainers since it is not a correctness issue.

@serathius
Copy link
Member Author

serathius commented Sep 8, 2023

Thanks @xiang90 for confirming there is no correctness issue. I can agree that having clock drift in cluster is not great, however this is an external problem to etcd. Many assumptions break with clock drift, take centralized logging, it becomes useless for debugging if logs are collected from nodes with clock drift.

I don't think this is inherently an etcd problem. I had multiple users scared asking me what's the impact on etcd. Will it not perform well? Is the whole cluster consistency is under risk? Overall I think it's not a good idea to try to solve issues that users don't have. Etcd is not a monitoring system, so it should not monitor or alert on clock drift. It just confuses user about why etcd cares about clock drift.

We can warn users about clock drift making etcd debugging hard. We can recommend that users run NTP and monitor their clock drift, even provide a link to external tools, but etcd should not take this responsibility on itself.

@Aditya-Sood
Copy link

hi @serathius, @xiang90
since the log is already of WARN type, I suppose there is no contribution expected for this issue?

@tjungblu
Copy link
Contributor

tjungblu commented Sep 8, 2023

I'm also +1 for removing it. WDYT about adding a timestamp to endpoint/status? That way we can still have some indication when we go through (disconnected) customer logs.

@serathius
Copy link
Member Author

@Aditya-Sood We haven't made decision on how to proceed yet.

@serathius
Copy link
Member Author

serathius commented Sep 8, 2023

WDYT about adding a timestamp to endpoint/status? That way we can still have some indication when we go through (disconnected) customer logs.

Don't think it will help. Clock drift at the moment of request can be totally unrelated to clock drift present at the moment of logs were written.

@serathius
Copy link
Member Author

serathius commented Sep 8, 2023

As confirmed in #16432 (comment) I would like to repeat the proposal of to remove the prober found high clock drift log. I think it causes confusion and doesn't help with debugging (more detail in #16432 (comment))

ping @ahrtr @jmhbnz @wenjiaswe

@jmhbnz
Copy link
Member

jmhbnz commented Sep 8, 2023

Hey Team - I've been following this thread and held off commenting as I'm still not fully familiar with the underlying code in question. However to answer the ping above, my vote would be to continue to inform users that clock drift over a certain threshold exists, but ensure this is done in such a way that it is clear there is no impact on etcd consistency.

@ahrtr
Copy link
Member

ahrtr commented Sep 8, 2023

Situations where clock drift could cause one member to consider lease expired while other member doesn't.

Isn't this a problem caused by clock drift? Especially when the problematic member is a leader.

Overall, I don't think this ticket deserve much time to discuss before the issue I mentioned in #16432 (comment) is resolved, especially #15247

@serathius
Copy link
Member Author

Situations where clock drift could cause one member to consider lease expired while other member doesn't.

Isn't this a problem caused by clock drift? Especially when the problematic member is a leader.

No, because:

  • Decision that lease expires is done by leader and executed via quorum.
  • Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.

@ahrtr
Copy link
Member

ahrtr commented Sep 8, 2023

  • Decision that lease expires is done by leader and executed via quorum.

FYI. It's being executed by quorum instead of being agreed/consensused by quorum.

Also per your logic, the issue #15247 should NOT happen, because the out of date leader (which gets stuck on writing for a long time) will never get the consensus.

  • Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.

Pls do not assume any user cases.

@serathius
Copy link
Member Author

serathius commented Sep 8, 2023

FYI. It's being executed by quorum instead of being agreed/consensused by quorum.

What I meant here is that the decision that lease should be invalidated is made by leader and then proposed to raft.

Also per your logic, the issue #15247 should NOT happen, because the out of date leader (which gets stuck on writing for a long time) will never get the consensus.

No, that's not true. My understanding: (please correct me if any of those points is incorrect)

  1. Lease grant request is sent, creation is negotiated via quorum and applied to all members backend.
    1a. If member is leader it also start clock to count down lease TTL.
    1b. Read request (ListLeases, Lease) about lease TTL will be forwarded to leader and leader will return time from the countdown clock it started.
  2. If leader changes,
    2a. old leader will stop the lease countdown clock
    2b. new leader will start clock again with original TTL. Notice that TTL is reset. (Assuming that optional feature of lease checkpointing is disabled).
  3. Lease TTL passes, clock started by leader will prompt them to send a LeaseRevoke request.
    3a. Read requests about lease TTL will return 1 second until lease is properly revoked.
  4. Lease Revoke is executed by quorum and applied to all member backends.
    4a. Read request about lease will return that lease doesn't exist.

Issue #15247 is caused by point 2a not properly executed. Old leader doesn't know that it should step down and countdown clock is ticking. This results in old leader executing point 3 even though it shouldn't be. I have pointed out those issues in #15944 long time ago.

Clock drift doesn't influence any of those steps, as cluster only depends on countdown clock on leader. If leader is changed, the TTL is reset. If leader clock is 10 seconds behind other members, it doesn't matter it the time difference it will count will still be TTL.

Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.

Pls do not assume any user cases.

We need to make assumptions, especially about leases which were designed for short living leader election tokens. 2 hour leases are already a big problem for Kubernetes due to lack of checkpointing.

Imagine that you generate 1 GB of Kubernetes events within an hour and you have an hour of TTL. Every time there is a leader change (can easily happened multiple times in hour), the TTL will be reset. One leader election you have 2GB, two leader changes, you have 3 GB and so on.

@prensgold

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants