-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix probes when installing on a fresh cluster #160
Conversation
See issue kubernetes-sigs#158. If the Secret with the certs doesn't exist, starting the webhook server will fail and HNC will exit. Before we configured the probes, we carefully didn't start the webhook server until after the certs were ready, but as a side effect of creating the probe functions, we accidentally started the webhook server before the certs were generated. This worked fine on a cluster that already had the certs (like the one I was testing on) but failed for everyone else (oops). The fix is simply to use a non-default checker that knows whether the certs have been generated, and avoids accidentally starting the webhook server if the certs don't exist yet. Tested: manually on a cluster without the Secret. Without this change, I can see the error from the webhook server complaining that the certs don't exist; with this change, I can see that that in HNC's first invocation, we get some "healthz check failed" messages, but the cert controller runs, generates the certs, and restarts HNC. The second invocation works just fine.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrianludwin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
While this solution seems ok, I think the canonical solution would be to start with a dummy certificate that later will be replaced by the certificate-provider. Just to solve the bootstrapping problem. That is how I would solve this using cert-manager. But I also think certificates should be provided to the application, and not provided by the application. /lgtm |
Oooops, I didn't think my lgtm would merge this. Sorry! I should have added a hold to get some discussions around my comment. |
I wonder if it may be worthwhile to reconsider removing the cert creation logic altogether from startup? Just have a dependency that certs are provided. We take in a secret and k8s will prevent us from starting until the secret exists. Then we can have a separate (optional) startup job that does any and all cert creation logic or leave it to users to not run the job if they have their own solution (cert manager). I know we discussed this in the past but I don't remember why we didn't like that approach? To handle cert rotations the job could be a cron job for self signed and cert manager handles for others. We just need a watcher on the mounted secret file to reload HNC on change. This would guarantee we have a cert at startup. |
I really like having a one-step install to make it easy to try out, not to
mention write tests for or even just test manually on a new cluster. Plus
cert-controller already existed so we didn't need to write anything
ourselves.
I'd rather go with the approach in #159 where we simply build multiple
manifests to make it easy for cert-manager users as well. wdyt?
…On Thu, Mar 24, 2022 at 10:10 AM Ryan Bezdicek ***@***.***> wrote:
I wonder if it may be worthwhile to reconsider removing the cert creation
logic altogether from startup? Just have a dependency that certs are
provided. We take in a secret and k8s will prevent us from starting until
the secret exists. Then we can have a separate (optional) startup job that
does any and all cert creation logic or leave it to users to not run the
job if they have their own solution (cert manager). I know we discussed
this in the past but I don't remember why we didn't like that approach? To
handle cert rotations the job could be a cron job for self signed and cert
manager handles for others. We just need a watcher on the mounted secret
file to reload HNC on change.
This would guarantee we have a cert at startup.
—
Reply to this email directly, view it on GitHub
<#160 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZFXTVSJSO3FLUAHD5TVBRZXDANCNFSM5RP4MJJA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I agree with @rjbez17. The current approach will complicate the code more over time, and put HNC further away from controller-runtime and kubebuilder. I'll vote for cert-manager as the default - as suggested by kubebuilder in the skaffolding process. And eventually an opt-in for cert-rotator, that could be used in tests - with some test adjustments. Or just use a self-signed cert in the tests and drop the cert-rotator. |
We did start with cert-manager as the default (and only) method but it made
development and experimentation difficult.
Can you or Ryan maybe create a setup script to show what a standalone
installer would look like, without either cert-controller or cert-manager?
…On Thu, Mar 24, 2022 at 10:49 AM Erik Godding Boye ***@***.***> wrote:
I agree with @rjbez17 <https://github.com/rjbez17>. The current approach
will complicate the code more over time, and put HNC further away from
controller-runtime and kubebuilder. I'll vote for cert-manager as the
default - as suggested by kubebuilder in the skaffolding process. And
eventually an opt-in for cert-rotator, that could be used in tests - with
some test adjustments. Or just use a self-signed cert in the tests and drop
the cert-rotator.
—
Reply to this email directly, view it on GitHub
<#160 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZG2V2YDYRP2JKC6FTDVBR6ITANCNFSM5RP4MJJA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Fixes #158
If the Secret with the certs doesn't exist, starting the
webhook server will fail and HNC will exit. Before we configured the
probes, we carefully didn't start the webhook server until after the
certs were ready, but as a side effect of creating the probe functions,
we accidentally started the webhook server before the certs were
generated. This worked fine on a cluster that already had the certs
(like the one I was testing on) but failed for everyone else (oops).
The fix is simply to use a non-default checker that knows whether the
certs have been generated, and avoids accidentally starting the webhook
server if the certs don't exist yet.
Tested: manually on a cluster without the Secret. Without this change, I
can see the error from the webhook server complaining that the certs
don't exist; with this change, I can see that that in HNC's first
invocation, we get some "healthz check failed" messages, but the cert
controller runs, generates the certs, and restarts HNC. The second
invocation works just fine.