Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79

hexfusion · 2021-05-11T23:47:02Z

manual cherry-pick of etcd-io#11704 into openshift-4.8

Health check reporting Before

2021-05-11 20:55:27.862866 I | etcdserver/api/etcdhttp: /health OK (status code 200)

After

{"level":"info","ts":"2021-05-12T00:21:55.349Z","caller":"etcdhttp/metrics.go:127","msg":"serving /health true"}
{"level":"info","ts":"2021-05-12T00:21:55.350Z","caller":"etcdhttp/metrics.go:65","msg":"/health OK","status-code":200}

This is of course too verbose which is why we need #80

The end goal will be providing a machine consumable etcd log which is persisted to the host and then rotated in the same fashion that audit logs for kube-apiserver work today. By ensuring healtch checks are taken at a 1s granularity and failures reported we can ensure with great precision the availability or lack thereof with regards to etcd service availability.

To make it easier to root-cause when /health check fails. For example, we are using load balancer to health check each etcd instance, and when one etcd node gets terminated, it's hard to tell whether etcd "server" was really failing or client (or load balancer") failed to reach the etcd cluster which is also failure in load balancer health check. Signed-off-by: Gyuho Lee <leegyuho@amazon.com>

openshift-ci · 2021-05-11T23:47:03Z

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-05-11T23:56:41Z

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hexfusion · 2021-05-12T10:22:57Z

/retest

lilic

/lgtm

openshift-ci · 2021-05-12T12:02:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, lilic

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hexfusion]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2021-05-12T13:09:45Z

/retest

Please review the full test history for this PR and help us cut down flakes.

hexfusion · 2021-05-12T13:34:24Z

/hold for perf review

openshift-ci · 2021-05-12T13:44:49Z

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-05-12T13:46:13Z

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hexfusion · 2021-05-12T14:02:22Z

cpu, and RSS are not affected I had a slight concern about creating logger config.

/hold cancel

hexfusion · 2021-05-12T14:03:19Z

we are a fork so invalid BZ is not accurate and GL approval is not required at this point so manually checking boxes.

openshift-bot · 2021-05-12T14:22:43Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-05-12T14:42:27Z

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-bot · 2021-05-12T15:34:44Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-05-12T18:06:52Z

@hexfusion: All pull requests linked via external trackers have merged:

openshift/etcd#78

Bugzilla bug 1958405 has been moved to the MODIFIED state.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from crawford and deads2k May 11, 2021 23:47

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2021

hexfusion changed the title ~~UPSTREAM: <carry>: *: log server-side /health checks~~ Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks May 11, 2021

openshift-ci bot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label May 11, 2021

openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 11, 2021

hexfusion added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 11, 2021

hexfusion mentioned this pull request May 12, 2021

Bug 1958405: UPSTREAM: <carry>: etcdserver/api/etcdhttp: log successful etcd server side health check in debug level #80

Merged

lilic approved these changes May 12, 2021

View reviewed changes

openshift-ci bot assigned lilic May 12, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2021

hexfusion added staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead). and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 12, 2021

hexfusion removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2021

openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 12, 2021

hexfusion added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 12, 2021

openshift-merge-robot merged commit 2de7094 into openshift:openshift-4.8 May 12, 2021

hexfusion deleted the cp-11704 branch May 12, 2021 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79

Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79

hexfusion commented May 11, 2021 •

edited

Loading

openshift-ci bot commented May 11, 2021

openshift-ci bot commented May 11, 2021

hexfusion commented May 12, 2021

lilic left a comment

openshift-ci bot commented May 12, 2021

openshift-bot commented May 12, 2021

hexfusion commented May 12, 2021

openshift-ci bot commented May 12, 2021

openshift-ci bot commented May 12, 2021

hexfusion commented May 12, 2021 •

edited

Loading

hexfusion commented May 12, 2021

openshift-bot commented May 12, 2021

openshift-ci bot commented May 12, 2021

openshift-bot commented May 12, 2021

openshift-ci bot commented May 12, 2021

Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79

Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79

Conversation

hexfusion commented May 11, 2021 • edited Loading

openshift-ci bot commented May 11, 2021

openshift-ci bot commented May 11, 2021

hexfusion commented May 12, 2021

lilic left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 12, 2021

openshift-bot commented May 12, 2021

hexfusion commented May 12, 2021

openshift-ci bot commented May 12, 2021

openshift-ci bot commented May 12, 2021

hexfusion commented May 12, 2021 • edited Loading

hexfusion commented May 12, 2021

openshift-bot commented May 12, 2021

openshift-ci bot commented May 12, 2021

openshift-bot commented May 12, 2021

openshift-ci bot commented May 12, 2021

hexfusion commented May 11, 2021 •

edited

Loading

hexfusion commented May 12, 2021 •

edited

Loading