Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default readiness check for webhook server? #723

Closed
ncdc opened this issue Dec 9, 2019 · 15 comments · Fixed by #1588
Closed

Add default readiness check for webhook server? #723

ncdc opened this issue Dec 9, 2019 · 15 comments · Fixed by #1588
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@ncdc
Copy link
Contributor

ncdc commented Dec 9, 2019

In the Cluster API project, we're trying to automate the following in our CI testing:

  1. Create a kind cluster
  2. Deploy cert-manager
  3. Wait for cert-manager to be ready
  4. Deploy Cluster API (CRDs, RBAC, Deployments, etc)
  5. Wait for Cluster API pods to be ready (so our webhooks are ready)
  6. Create a Cluster, Machine, etc.

When we're waiting for the Cluster API pods to be ready, we've noticed some test flakes because it appears the pods are ready but their webhook servers are not fully online and ready to respond to requests yet.

I've thought about adding a readiness probe to our Manager, but I believe there is still a chance that it could return a false positive; i.e., the check returns ready, but because the webhook server is technically a different server than the readiness server, the webhook server might not be ready yet.

Would it make sense to add a default readiness check for the webhook server?

@DirectXMan12
Copy link
Contributor

yeah, that makes sense

@DirectXMan12
Copy link
Contributor

/kind feature
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Dec 10, 2019
@ncdc
Copy link
Contributor Author

ncdc commented Dec 10, 2019

@DirectXMan12 I can work on this. How much flexibility would you like for the check? Should we add AddReadyzCheck() to webhook.Server? Or start simple and hard-code a simple ping response?

@vincepri
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 21, 2020
@ncdc
Copy link
Contributor Author

ncdc commented Feb 21, 2020

@vincepri I'm still interested in working on this. Do you have thoughts on my question above?

@vincepri
Copy link
Member

I'd start simple and hardcode the Readyz checks for now, we can provide a flag later. This should also be backward compatible

@ncdc
Copy link
Contributor Author

ncdc commented Feb 21, 2020

Define backward compat here?

@vincepri
Copy link
Member

As in, adding new endpoints for readiness is additive and doesn't impact the current behaviors of the webhook server

@phoracek
Copy link

Just curious, what is this supposed to on top of #419 ? Just to use the probe by default? It seems fairly simple to add a readiness probe in the current state of the controller-runtime.

@ncdc
Copy link
Contributor Author

ncdc commented Feb 21, 2020

This would just be exposing checks that people could configure in their pods. Not sure there's any compat issues or impacts.

@phoracek please see the description above for the rationale.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2020
@vincepri
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 21, 2020
@alenkacz
Copy link
Contributor

alenkacz commented Aug 4, 2020

@ncdc are you still working on this? I've just spent many hours trying to figure out a nice way how to create such readiness check from outside that I am starting to think it would be easier to implement it within controller-runtime

@ncdc
Copy link
Contributor Author

ncdc commented Aug 4, 2020

@alenkacz not at the moment, no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants