-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Livez/Readyz #16007
Comments
Please check this #13340 (comment) |
Yeah I don't buy it. No one is going to dig up an obscure github issue in order to properly configure their etcd configurations for Kubernetes. |
Yeah that makes sense. We should rethink and document it properly, for example it applies only which etcd version, etc. |
As long as you don't touch the existing health endpoint, it's completely backwards compatible and therefore can even be backported. |
for ref: #16008 |
Thanks @logicalhan for raising this request. I am supportive on it.
While |
cc @neolit123 |
+1 |
Don't want to rush into adding livez/readyz probe. Main problem with existing health probe we just added it to have it without proper consideration. I want livez to properly reflect fact that etcd needs restart, for example etcd is stuck on stalled storage https://docs.google.com/document/d/1U9hAcZQp3Y36q_JFiw2VBJXVAo2dK2a-8Rsbqv3GgDo/edit?usp=sharing. Readyz should properly reflect fact that etcd is ready to serve traffic. Don't think alarms matter here. It's a degradation, however it doesn't mean we shouldn't serve reads. TLDR; I would like to have a design written that will do a proper analysis etcd failure modes and propose matching probes to detect them. Example kubernetes-sigs/metrics-server#542 |
Thanks for bringing this up @logicalhan. I will continue work on this. |
Link to #15440 |
Reached out to @wenjiaswe for collaboration of the latest updated version of the design doc etcd livez and readyz. Updates resolve the comments / feedback mentioned in the issue and PoC #16008. /cc @dims |
Thank you @chaochn47, could you please use a google doc so we could comment? Thanks! |
cc @marukozh who is also working on this. |
Done. Anyone in |
Various discussions are scattered in various places, so I raise my comment under this ticket. liveness probeA node is live when both below are satisfied:
readyness probeBasically it shares the same logic as the existing health check (see below), and
etcd/server/etcdserver/api/etcdhttp/health.go Lines 47 to 57 in 0a3dc1a
CompatiblityDo not break the existing |
Please leave your comments on the document https://docs.google.com/document/d/1PaUAp76j1X92h3jZF47m32oVlR8Y-p-arB5XOB7Nb6U/edit?usp=sharing |
Created a k/k issue to track this kubernetes/kubernetes#120970 |
Tracking work
|
is there any plan to add a |
etcdctl uses GRPC only, we would need to make equivalent of |
@scuzhanglei I think we have that filed under #16276, just needs a cmdline in etcdctl since we're bumping that thread again already, is there anything left to pick up? I've just "saved" one PR #16959 from @siyuanfoundation from being stale reaped, I think #16858 is also going to fall prey to the evil bot soon. |
I have tried to add the commands before 293f087#diff-ab6fb0684315e16355f6ebe0f4b3cf860b9b2ff5a0fe1b4e4308a680b19f1b0c. Currently I don't have time to rebase it to the most recent implementation of livez/readyz. Hope someone can pick it up. |
Hey @siyuanfoundation, I will pick this issue up! |
What would you like to be added?
We currently have a single health endpoint for etcd
/health
which is used in Kubernetes distros as both liveness and readiness checking. In order to be fully api-compliant, we should have both a liveness check (i.e./livez
) which checks that this individual etcd member is "alive" and does not need to be restarted and a readiness check (i.e./readyz
) which signals that the etcd member is ready to accept traffic.Why is this needed?
There is a difference between "please restart me I'm that unhealthy" vs "please send me all sorts of traffic, I'm ready for it".
The text was updated successfully, but these errors were encountered: