Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks #79

Merged
merged 1 commit into from
May 12, 2021

Conversation

hexfusion
Copy link

@hexfusion hexfusion commented May 11, 2021

manual cherry-pick of etcd-io#11704 into openshift-4.8

Health check reporting Before

2021-05-11 20:55:27.862866 I | etcdserver/api/etcdhttp: /health OK (status code 200)

After

{"level":"info","ts":"2021-05-12T00:21:55.349Z","caller":"etcdhttp/metrics.go:127","msg":"serving /health true"}
{"level":"info","ts":"2021-05-12T00:21:55.350Z","caller":"etcdhttp/metrics.go:65","msg":"/health OK","status-code":200}

This is of course too verbose which is why we need #80

The end goal will be providing a machine consumable etcd log which is persisted to the host and then rotated in the same fashion that audit logs for kube-apiserver work today. By ensuring healtch checks are taken at a 1s granularity and failures reported we can ensure with great precision the availability or lack thereof with regards to etcd service availability.

To make it easier to root-cause when /health check fails.
For example, we are using load balancer to health check
each etcd instance, and when one etcd node gets terminated,
it's hard to tell whether etcd "server" was really failing
or client (or load balancer") failed to reach the etcd cluster
which is also failure in load balancer health check.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
@openshift-ci
Copy link

openshift-ci bot commented May 11, 2021

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from crawford and deads2k May 11, 2021 23:47
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2021
@hexfusion hexfusion changed the title UPSTREAM: <carry>: *: log server-side /health checks Bug 1958405: UPSTREAM: <carry>: *: log server-side /health checks May 11, 2021
@openshift-ci openshift-ci bot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label May 11, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 11, 2021

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

  • expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 11, 2021
@hexfusion hexfusion added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 11, 2021
@hexfusion
Copy link
Author

/retest

Copy link

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 12, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, lilic

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Author

/hold for perf review

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 12, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 12, 2021

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

  • expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci
Copy link

openshift-ci bot commented May 12, 2021

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

  • expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Author

hexfusion commented May 12, 2021

cpu, and RSS are not affected I had a slight concern about creating logger config.

/hold cancel

@hexfusion hexfusion added staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead). and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 12, 2021
@hexfusion
Copy link
Author

we are a fork so invalid BZ is not accurate and GL approval is not required at this point so manually checking boxes.

@hexfusion hexfusion removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2021
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 12, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 12, 2021

@hexfusion: This pull request references Bugzilla bug 1958405, which is invalid:

  • expected Bugzilla bug 1958405 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion hexfusion added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 12, 2021
@openshift-merge-robot openshift-merge-robot merged commit 2de7094 into openshift:openshift-4.8 May 12, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 12, 2021

@hexfusion: All pull requests linked via external trackers have merged:

Bugzilla bug 1958405 has been moved to the MODIFIED state.

In response to this:

Bug 1958405: UPSTREAM: : *: log server-side /health checks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion deleted the cp-11704 branch May 12, 2021 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants