Fix probes when installing on a fresh cluster #160

adrianludwin · 2022-03-24T03:58:52Z

Fixes #158

If the Secret with the certs doesn't exist, starting the
webhook server will fail and HNC will exit. Before we configured the
probes, we carefully didn't start the webhook server until after the
certs were ready, but as a side effect of creating the probe functions,
we accidentally started the webhook server before the certs were
generated. This worked fine on a cluster that already had the certs
(like the one I was testing on) but failed for everyone else (oops).

The fix is simply to use a non-default checker that knows whether the
certs have been generated, and avoids accidentally starting the webhook
server if the certs don't exist yet.

Tested: manually on a cluster without the Secret. Without this change, I
can see the error from the webhook server complaining that the certs
don't exist; with this change, I can see that that in HNC's first
invocation, we get some "healthz check failed" messages, but the cert
controller runs, generates the certs, and restarts HNC. The second
invocation works just fine.

See issue kubernetes-sigs#158. If the Secret with the certs doesn't exist, starting the webhook server will fail and HNC will exit. Before we configured the probes, we carefully didn't start the webhook server until after the certs were ready, but as a side effect of creating the probe functions, we accidentally started the webhook server before the certs were generated. This worked fine on a cluster that already had the certs (like the one I was testing on) but failed for everyone else (oops). The fix is simply to use a non-default checker that knows whether the certs have been generated, and avoids accidentally starting the webhook server if the certs don't exist yet. Tested: manually on a cluster without the Secret. Without this change, I can see the error from the webhook server complaining that the certs don't exist; with this change, I can see that that in HNC's first invocation, we get some "healthz check failed" messages, but the cert controller runs, generates the certs, and restarts HNC. The second invocation works just fine.

k8s-ci-robot · 2022-03-24T03:59:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianludwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adrianludwin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adrianludwin · 2022-03-24T03:59:35Z

/assign @rjbez17
/cc @erikgb

erikgb · 2022-03-24T09:23:30Z

While this solution seems ok, I think the canonical solution would be to start with a dummy certificate that later will be replaced by the certificate-provider. Just to solve the bootstrapping problem. That is how I would solve this using cert-manager. But I also think certificates should be provided to the application, and not provided by the application.

/lgtm

erikgb · 2022-03-24T09:27:42Z

Oooops, I didn't think my lgtm would merge this. Sorry! I should have added a hold to get some discussions around my comment.

rjbez17 · 2022-03-24T14:10:47Z

I wonder if it may be worthwhile to reconsider removing the cert creation logic altogether from startup? Just have a dependency that certs are provided. We take in a secret and k8s will prevent us from starting until the secret exists. Then we can have a separate (optional) startup job that does any and all cert creation logic or leave it to users to not run the job if they have their own solution (cert manager). I know we discussed this in the past but I don't remember why we didn't like that approach? To handle cert rotations the job could be a cron job for self signed and cert manager handles for others. We just need a watcher on the mounted secret file to reload HNC on change.

This would guarantee we have a cert at startup.

adrianludwin · 2022-03-24T14:45:29Z

I really like having a one-step install to make it easy to try out, not to mention write tests for or even just test manually on a new cluster. Plus cert-controller already existed so we didn't need to write anything ourselves. I'd rather go with the approach in #159 where we simply build multiple manifests to make it easy for cert-manager users as well. wdyt?

…

On Thu, Mar 24, 2022 at 10:10 AM Ryan Bezdicek ***@***.***> wrote: I wonder if it may be worthwhile to reconsider removing the cert creation logic altogether from startup? Just have a dependency that certs are provided. We take in a secret and k8s will prevent us from starting until the secret exists. Then we can have a separate (optional) startup job that does any and all cert creation logic or leave it to users to not run the job if they have their own solution (cert manager). I know we discussed this in the past but I don't remember why we didn't like that approach? To handle cert rotations the job could be a cron job for self signed and cert manager handles for others. We just need a watcher on the mounted secret file to reload HNC on change. This would guarantee we have a cert at startup. — Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZFXTVSJSO3FLUAHD5TVBRZXDANCNFSM5RP4MJJA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

erikgb · 2022-03-24T14:49:34Z

I agree with @rjbez17. The current approach will complicate the code more over time, and put HNC further away from controller-runtime and kubebuilder. I'll vote for cert-manager as the default - as suggested by kubebuilder in the skaffolding process. And eventually an opt-in for cert-rotator, that could be used in tests - with some test adjustments. Or just use a self-signed cert in the tests and drop the cert-rotator.

adrianludwin · 2022-03-24T14:55:10Z

We did start with cert-manager as the default (and only) method but it made development and experimentation difficult. Can you or Ryan maybe create a setup script to show what a standalone installer would look like, without either cert-controller or cert-manager?

…

On Thu, Mar 24, 2022 at 10:49 AM Erik Godding Boye ***@***.***> wrote: I agree with @rjbez17 <https://github.com/rjbez17>. The current approach will complicate the code more over time, and put HNC further away from controller-runtime and kubebuilder. I'll vote for cert-manager as the default - as suggested by kubebuilder in the skaffolding process. And eventually an opt-in for cert-rotator, that could be used in tests - with some test adjustments. Or just use a self-signed cert in the tests and drop the cert-rotator. — Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZG2V2YDYRP2JKC6FTDVBR6ITANCNFSM5RP4MJJA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 24, 2022

adrianludwin added this to the release-v1.0 milestone Mar 24, 2022

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 24, 2022

k8s-ci-robot requested review from srampal and tashimi March 24, 2022 03:59

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2022

k8s-ci-robot assigned rjbez17 Mar 24, 2022

k8s-ci-robot requested a review from erikgb March 24, 2022 03:59

This was referenced Mar 24, 2022

Replace unstructured query with structured query #154

Merged

Copy labels and annotations from SubnamespaceAnchor to child namespace #149

Merged

Support for Openshift #141

Merged

Remove last callers of webhooks.Deny function and delete it #155

Merged

k8s-ci-robot assigned erikgb Mar 24, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 24, 2022

k8s-ci-robot merged commit 921b678 into kubernetes-sigs:master Mar 24, 2022

adrianludwin deleted the fix-probes branch March 24, 2022 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix probes when installing on a fresh cluster #160

Fix probes when installing on a fresh cluster #160

adrianludwin commented Mar 24, 2022 •

edited

Loading

k8s-ci-robot commented Mar 24, 2022

adrianludwin commented Mar 24, 2022

erikgb commented Mar 24, 2022

erikgb commented Mar 24, 2022

rjbez17 commented Mar 24, 2022

adrianludwin commented Mar 24, 2022 via email

erikgb commented Mar 24, 2022

adrianludwin commented Mar 24, 2022 via email

Fix probes when installing on a fresh cluster #160

Fix probes when installing on a fresh cluster #160

Conversation

adrianludwin commented Mar 24, 2022 • edited Loading

k8s-ci-robot commented Mar 24, 2022

adrianludwin commented Mar 24, 2022

erikgb commented Mar 24, 2022

erikgb commented Mar 24, 2022

rjbez17 commented Mar 24, 2022

adrianludwin commented Mar 24, 2022 via email

erikgb commented Mar 24, 2022

adrianludwin commented Mar 24, 2022 via email

adrianludwin commented Mar 24, 2022 •

edited

Loading