-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79
Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79
Conversation
To make it easier to root-cause when /health check fails. For example, we are using load balancer to health check each etcd instance, and when one etcd node gets terminated, it's hard to tell whether etcd "server" was really failing or client (or load balancer") failed to reach the etcd cluster which is also failure in load balancer health check. Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
@hexfusion: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hexfusion, lilic The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
/hold for perf review |
@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cpu, and RSS are not affected I had a slight concern about creating logger config. /hold cancel |
we are a fork so invalid BZ is not accurate and GL approval is not required at this point so manually checking boxes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@hexfusion: All pull requests linked via external trackers have merged: Bugzilla bug 1958405 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
manual cherry-pick of etcd-io#11704 into openshift-4.8
Health check reporting Before
After
This is of course too verbose which is why we need #80
The end goal will be providing a machine consumable etcd log which is persisted to the host and then rotated in the same fashion that audit logs for kube-apiserver work today. By ensuring healtch checks are taken at a 1s granularity and failures reported we can ensure with great precision the availability or lack thereof with regards to etcd service availability.