-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[concept] add livez/readyz for etcd #16008
Conversation
Change-Id: Ia440a82b2bf3d275b7cd7d88b5a6e86fe9fe1c28 Signed-off-by: Han Kang <hankang@google.com> Change-Id: Ief9475a92429be58eb7b1f96246bbdb00e996e75
Change-Id: Ie699ae11d0ecc315b91365f85f0ac0b2d339c28d Signed-off-by: Han Kang <hankang@google.com> Change-Id: Iee6f469f63cb1fbcc22a4d633a621b7915a1a799
Signed-off-by: Han Kang <hankang@google.com> Change-Id: I7e95be58cff6b7bc47fa3114249074a9f69a1620
Co-authored-by: Benjamin Wang <wachao@vmware.com> Signed-off-by: Han Kang <hankang@google.com> Change-Id: I45fda0a8ee7d80638af96fee4efb3bfdf2aebaf8
Change-Id: Ie5f02bba1a63f7592c6f3500db9070e6f1022df0 Signed-off-by: Han Kang <hankang@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to rush into adding livez/readyz probe. Main problem with existing health probe we just added it to have it without proper consideration.
I want livez to properly reflect fact that etcd needs restart, for example etcd is stuck on stalled storage https://docs.google.com/document/d/1U9hAcZQp3Y36q_JFiw2VBJXVAo2dK2a-8Rsbqv3GgDo/edit?usp=sharing.
Readyz should properly reflect fact that etcd is ready to serve traffic. Don't think alarms matter here. It's a degradation, however it doesn't mean we shouldn't serve reads.
TLDR; I would like to have a design written that will do a proper analysis etcd failure modes and propose matching probes to detect them. Example kubernetes-sigs/metrics-server#542
@@ -141,7 +169,7 @@ func getSerializableFlag(r *http.Request) bool { | |||
|
|||
// TODO: etcdserver.ErrNoLeader in health API | |||
|
|||
func checkAlarms(lg *zap.Logger, srv ServerHealth, excludedAlarms AlarmSet) Health { | |||
func checkAlarms(lg *zap.Logger, srv ServerHealth, excludedAlarms AlarmSet, healthType string) Health { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The healthType string
isn't used at all, can we remove it?
func checkAlarms(lg *zap.Logger, srv ServerHealth, excludedAlarms AlarmSet, healthType string) Health { | |
func checkAlarms(lg *zap.Logger, srv ServerHealth, excludedAlarms AlarmSet) Health { |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Discussed during sig-etcd triage meeting. This original concept has now been superceeded by the work from @siyuanfoundation 🎉 |
the work is tracked in #16007 |
This is a prototype for adding livez/readyz support to etcd. Currently I've configured the
NO_SPACE
alarm to only count towards\readyz
since it means etcd is degraded. Whether quorum should be included in liveness is an open question.