Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only create per-object webhooks for configured types #285

Merged
merged 15 commits into from
Jun 10, 2024

Conversation

erikgb
Copy link
Contributor

@erikgb erikgb commented May 28, 2023

Fixes #152

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 28, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 28, 2023
@erikgb
Copy link
Contributor Author

erikgb commented May 28, 2023

/test all

@erikgb
Copy link
Contributor Author

erikgb commented May 28, 2023

/test pull-hnc-test

1 similar comment
@erikgb
Copy link
Contributor Author

erikgb commented May 28, 2023

/test pull-hnc-test

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 28, 2023
@erikgb
Copy link
Contributor Author

erikgb commented May 28, 2023

/test pull-hnc-test

@erikgb
Copy link
Contributor Author

erikgb commented Jun 11, 2023

/test pull-hnc-test

@erikgb
Copy link
Contributor Author

erikgb commented Jun 11, 2023

@adrianludwin Existing tests are running fine now. Before writing some tests for this new feature, I would like to get your opinion on the approach. Because of the cert-rotator (as always 😉), I cannot move the now dynamic objects-webhook to a dedicated validating webhook configuration resource. IMO we should not mess with resources provisioned as part of HNC. That will never work with modern GitOps-tools like FluxCD/ArgoCD - as their reconcilers will compete with our internal reconciler. It "might" work if it was possible to create a webhook without rules (for the initial configuration), but it seems like this is not allowed by the API server. Update: This was not possible when defining the webhook with the kubebuilder markers, but I worked around this by extending the patch to create the complete webhook (without rules).

I would love to get your thoughts anyway, but I will continue writing tests now.

@adrianludwin
Copy link
Contributor

@adrianludwin Existing tests are running fine now. Before writing some tests for this new feature, I would like to get your opinion on the approach. Because of the cert-rotator (as always 😉), I cannot move the now dynamic objects-webhook to a dedicated validating webhook configuration resource. IMO we should not mess with resources provisioned as part of HNC. That will never work with modern GitOps-tools like FluxCD/ArgoCD - as their reconcilers will compete with our internal reconciler.

Is this only a problem with cert-rotator? Or would cert-manager have the same problem? I would have thought that any controller that tried to inject certs into the webhook would have the same problem?

How does cert-manager work with gitops tools today? Is there no way to say "please ignore these fields?"

This was not possible when defining the webhook with the kubebuilder markers, but I worked around this by extending the patch to create the complete webhook (without rules).

Does this have something to do with the cert-rotator problem? I thought the caBundle needs to be injected to every new webhook (e.g. there isn't one caBundle per config) so why does moving these to a new config help? Just so Flux/Argo don't try to reconcile it away?

Instead of hand-writing a patch, why not create the entire webhook config in code? There's not a lot in a webhook config other than the list of webhooks.

@erikgb
Copy link
Contributor Author

erikgb commented Jun 12, 2023

@adrianludwin Existing tests are running fine now. Before writing some tests for this new feature, I would like to get your opinion on the approach. Because of the cert-rotator (as always 😉), I cannot move the now dynamic objects-webhook to a dedicated validating webhook configuration resource. IMO we should not mess with resources provisioned as part of HNC. That will never work with modern GitOps-tools like FluxCD/ArgoCD - as their reconcilers will compete with our internal reconciler.

Is this only a problem with cert-rotator? Or would cert-manager have the same problem? I would have thought that any controller that tried to inject certs into the webhook would have the same problem?

I think my formulation was bad, sorry. The real problem is not with cert-rotator (or cert-manager). But with cert-manager, we could create a dedicated dynamic ValidatingWebhookConfiguration from code, and not mess with the VWC statically provisioned as part of the HNC deployment. Having multiple controllers working on the same resource is error-prone, but as long as the controllers do not touch the same fields, it can work. So without specifying the object-webhook rules in the static HNC configuration, this seems to work ok.

This was not possible when defining the webhook with the kubebuilder markers, but I worked around this by extending the patch to create the complete webhook (without rules).

Does this have something to do with the cert-rotator problem? I thought the caBundle needs to be injected to every new webhook (e.g. there isn't one caBundle per config) so why does moving these to a new config help? Just so Flux/Argo don't try to reconcile it away?

Instead of hand-writing a patch, why not create the entire webhook config in code? There's not a lot in a webhook config other than the list of webhooks.

WDYM? The kubebuilder markers does not allow a user to configure a webhook without rules. That is why I suggest to add the complete object-webhook using a patch. All other webhooks are sourced from kubebuilder-markers, so the webhook/manifests.yaml file is generated.

@adrianludwin
Copy link
Contributor

I think my formulation was bad, sorry. The real problem is not with cert-rotator (or cert-manager). But with cert-manager, we could create a dedicated dynamic ValidatingWebhookConfiguration from code, and not mess with the VWC statically provisioned as part of the HNC deployment.

Why doesn't this work with cert-rotator as well?

Having multiple controllers working on the same resource is error-prone, but as long as the controllers do not touch the same fields, it can work. So without specifying the object-webhook rules in the static HNC configuration, this seems to work ok.

Yeah, this is what I'd expect as well. As long as the new controller doesn't respond to changes to caBundle, and cert-rotator doesn't respond to anything except changes to caBundle, we should be fine.

WDYM? The kubebuilder markers does not allow a user to configure a webhook without rules. That is why I suggest to add the complete object-webhook using a patch. All other webhooks are sourced from kubebuilder-markers, so the webhook/manifests.yaml file is generated.

Yup that sgtm. My only suggestion was that instead of hand-creating the object webhook in a patch, the HNC controller just has the entire object built-in. That is, if (say) hnc-object-controller-vwc is missing, HNC just creates it - it doesn't rely on anything being deployed by kubectl apply, Flux, or anyone else. This is what we do with HNCConfiguration - there's no HNC Config in the manifests, we just create one when HNC starts up if it doesn't already exist.

@adrianludwin
Copy link
Contributor

... I guess the only problem with creating it automatically is that if users want to disable it somehow, they can't. Maybe that's a good reason to have HNC rely on the VWC config existing:

  • Users can delete all object webhooks simply by deleting the config, and HNC won't recreate it
  • We could let users add annotations to the VWC (e.g. "don't update" for temporary investigations). We don't need this on day 1, but it's a nice future option.

Given this, a hand-written, empty VWC sgtm!

@erikgb erikgb force-pushed the dynamic-objects-webhook branch from c93ea91 to 3004313 Compare June 16, 2023 20:02
@erikgb
Copy link
Contributor Author

erikgb commented Jun 16, 2023

/test pull-hnc-test

@erikgb
Copy link
Contributor Author

erikgb commented Jun 16, 2023

@adrianludwin I think the code is almost done and ready for review. All existing tests are passing, and I think this indirectly verifies that this feature works. The e2e-tests would never succeed if the objects-webhook was not reconciled correctly.
But I wonder if we should test some more corner cases in the integration tests? Writing unit tests for the new code seems almost impossible if I am supposed to follow the existing code style - which I think you prefer. That would eventually require more types to be exported as tests of reconcilers are currently located in a dedicated test-package. WDYT?

@erikgb erikgb changed the title WIP: Only create per-object webhooks for configured types Only create per-object webhooks for configured types Jun 20, 2023
@erikgb erikgb marked this pull request as ready for review June 21, 2023 06:46
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 21, 2023
@k8s-ci-robot k8s-ci-robot requested a review from tashimi June 21, 2023 06:46
@mzain
Copy link
Contributor

mzain commented Oct 12, 2023

Hey @adrianludwin @erikgb when do we think this will be merged and available for use?
it is an important feature for our usage of HNC.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2024
This was referenced Mar 10, 2024
@erikgb
Copy link
Contributor Author

erikgb commented Mar 10, 2024

@adrianludwin I have now reverted all suggested changes to the integration test setup in ac97676 - as you had a lot of comments/questions to the proposed changes. I am pretty sure CI should fail now. Feel free to look into it. Since we are no longer using HNC I am not prepared to work more on this PR.

@erikgb
Copy link
Contributor Author

erikgb commented Mar 10, 2024

As expected, it fails now: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_hierarchical-namespaces/285/pull-hnc-test/1766783682422509568. If I remember this correctly, a webhook is required in the reconciler integration tests because the reconciler is now managing a webhook. And since the envTest represents a real control plane - we also need to mock/fake the webhook endpoint. I am going to revert the revert now, and hope you can look at this PR again - with the understanding that the new webhook setup IS required in the reconciler integration test.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2024
@erikgb erikgb force-pushed the dynamic-objects-webhook branch from d3ae6d7 to 237fdba Compare June 9, 2024 20:33
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2024
@erikgb erikgb force-pushed the dynamic-objects-webhook branch from 237fdba to 36139e3 Compare June 9, 2024 20:42
@rjbez17
Copy link
Contributor

rjbez17 commented Jun 10, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: erikgb, rjbez17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 977e2c5 into kubernetes-sigs:master Jun 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only create per-object webhooks for configured types
6 participants