Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add healthz deep subpathes for all controllers #201

Merged
merged 1 commit into from
Apr 25, 2023
Merged

Conversation

haouc
Copy link
Contributor

@haouc haouc commented Apr 12, 2023

Issue #, if available:

Description of changes:
We want to add deep health check probes into individual controllers.

The current healthz ping is simple and basically just checking if the VPC RC manager goroutine (main routine) is not failing. Reported customers’ cases told us the main check may not be sufficient in some cases. What we want

  • if one goroutine blocks resources, the healthz need to restart the entire controller
  • the healthz probes should cover all sub controllers
  • the heatlhz probe should have its own timeout instead of rely on kubelet timeout
  • the health status monitored by kubelet should be aggregated (one failure should fails all)
  • simple ping or complicate probing real data should depend on how critical and vulnerable the targeted controller may be.

Tests have been done

[+]health-annotation-validating-webhook ok
[+]health-branch-provider ok
[+]health-cm-controller ok
[+]health-custom-pod-controller ok
[+]health-deploy-controller ok
[+]health-interface-cleaner ok
[+]health-introspect-controller ok
[+]health-ipv4-provider ok
[+]health-node-controller ok
[+]health-node-manager ok
[+]health-node-validating-webhook ok
[+]health-pod-controller ok
[+]health-pod-mutating-webhook ok
[+]health-resource-manager ok
[+]health-root-manager-ping ok
healthz check passed

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@haouc haouc requested a review from a team as a code owner April 12, 2023 00:44
Copy link
Contributor

@sushrk sushrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed changes to the node controller in detail and high-level review of other changes. Looks good overall, had a few comments and one general question. Please see.

main.go Outdated Show resolved Hide resolved
config/webhook/manifests.yaml Outdated Show resolved Hide resolved
controllers/apps/deployment_controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sushrk sushrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, please take a look.

pkg/healthz/healthz.go Outdated Show resolved Hide resolved
pkg/healthz/healthz.go Outdated Show resolved Hide resolved
pkg/worker/worker.go Show resolved Hide resolved
@haouc
Copy link
Contributor Author

haouc commented Apr 25, 2023

Seems controller-gen has some flakiness. Rerunning the failed job succeeded. I will take some look at this tool as working on #199

Copy link
Contributor

@sushrk sushrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@haouc haouc merged commit ecbd696 into aws:master Apr 25, 2023
haouc added a commit that referenced this pull request May 29, 2023
* add healthz subpathes for all controllers (#201)

* support arch arg in dockerfile (#207)

* updated vpc limits to include fields for hypervisor type and bare metal status (#217)

* enable node events when instance type is not supported (#218)

* Associate primary network interface SG with the trunk ENI when SG is not specified in ENIConfig (#221)

* Associate primary network interface SG with the trunk ENI when SG is not specified in ENIConfig

* add a new CRD to delegate vpc resource requests (#210)

* upgrade controller runtime version (#227)

* rebased onto master branch

* fixed merge conflict

---------

Co-authored-by: Hao Zhou <haouc@users.noreply.github.com>
Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants