Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding cert-manager into CAPI for smaller footprint/less upgrade deps #7443

Open
jayunit100 opened this issue Oct 23, 2022 · 14 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jayunit100
Copy link
Contributor

jayunit100 commented Oct 23, 2022

I recently realized that the 3 cert-manager related processes we use in CAPI are actually not required when a cluster is running, and only required, i believe:

  • on upgrades
  • on creation of a new cluster

User Story

As a CAPI User I'd like to minimize the number of containers I need to run persistently in my cluster, and thus i'd like a like a default "batteries included" certificate management solution running in one my essential capi pods... so that i dont need to run certmanager after cluster creation.

i.e. the dumbed down version of this would be something like... "lets just vendor certmanager and use it when needed".

Detailed Description

I think these 3 services:

  • ca-injector
  • webhook
  • certmanager

Can

  • all be run as one off processes on startup
  • invoked as needed during upgrades
  • be non-existent at other times ?

Anything else you would like to add:

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2022
@sbueringer
Copy link
Member

sbueringer commented Oct 24, 2022

I think it's a fair ask to consider getting rid of cert-manager considering what we are using it for.

Today we use cert-manager only for generation of the webhook certificates. This roughly works like this:

  • We deploy Certificate and Issuer CRs (one each per-provider)
  • cert-manger generates a Secret with the serving certificate + key
  • The secret is
    • mounted into the provider Pod so it can be used to serve the webhook
    • injected into the CRDs of a provider (caBundle) so the apiserver can use it when calling the conversion webhook
    • injected into the MutatingWebhookConfigurations / ValidaingWebhookConfigurations so the apiserver can use it when calling the webhooks

I think we could take a look if there are libraries nowadays we can embed which provide this functionality without having to deploy & manage cert-manager.

Please see this discussion on Twitter: https://twitter.com/ahmetb/status/1549498418174644224?s=20&t=yuvVxSzP1reWQEzTfKNKEg

P.S. Istio was doing something like this ca. 3-4 years ago. Not sure if they still do.

I'm not a fan of trying to run cert-manager at the right times where it is needd. This is hard to get right and maintain in my opinion. It's also a problem if other components in the mgmt cluster depend on cert-manager.

@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 30, 2022

/triage accepted

It will be interesting to have a security expert taking a look at this; what I would like to avoid is to support both the custom certificate generation process and the cert-manager way of doing stuff due to some corner case like e.g. bring your own certificate issuer or something else that will not be present in our custom solution

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 30, 2022
@fabriziopandini
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 30, 2022
@kfox1111
Copy link

I for one like having all the certificate generation code centralized into a well known, well reviewed product like cert-manager. Having each and every project build or at least embed its own is rather scary.

I'd say, at very minimum, please keep the option.

@sbueringer
Copy link
Member

I think we could still keep the option that folks can use cert manager. But I would like to get rid of the responsibility of us having to manage cert manager

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Jan 7, 2023

I think it's a fair ask to consider getting rid of cert-manager considering what we are using it for.

I think we could take a look if there are libraries nowadays we can embed which provide this functionality without having to deploy & manage cert-manager.

Strongly agree.

As I write this (latest release is v1.3.1), there is little flexibility for the clusterctl CLI user: they cannot install cert-manager themselves, although they can specify a version and/or YAML manifest. Oddly, they can skip upgrading cert-manager. Sorry, I misread the code. In fact, clusterctl does use cert-manager that is already installed. This is also documented in https://cluster-api.sigs.k8s.io/clusterctl/configuration.html#cert-manager-configuration.

@dlipovetsky
Copy link
Contributor

I for one like having all the certificate generation code centralized into a well known, well reviewed product like cert-manager. Having each and every project build or at least embed its own is rather scary.

@kfox1111 While I see your point, let's remember that, by default, the KubeadmControlPlane controller creates all workload cluster certificates.

@dlipovetsky
Copy link
Contributor

Let's remember that external Cluster API providers (infrastructure, bootstrap, etc) also deploy webhooks. They also depend on cert-manager.

If we replace cert-manager with a library, the library must provide certificates for external providers' webhooks.

One library I know of is https://github.com/open-policy-agent/cert-controller. Design to create, and periodically rotate, certificates for one set of webhooks, a recent PR (open-policy-agent/cert-controller#52) may allow it to support multiple sets of webhooks. We would probably run this controller in the "core" manager.

@rikatz
Copy link

rikatz commented Jun 27, 2023

@jcmoraisjr as we discussed

@rikatz
Copy link

rikatz commented Jun 28, 2023

On ingress-nginx we use https://github.com/kubernetes/ingress-nginx/tree/main/images/kube-webhook-certgen which is a forked version of https://github.com/jet/kube-webhook-certgen

I'm interested in getting this used more broadly, or having other projects to use the same logic. If this is a path (and turning this certgen not only a CLI but also a library to be consumed by "webhook cases") we should probably turn this project a k-sigs independent subproject.

My only concern currently with this project is the "certificate expiration/rotation" that IIRC is not implemented today.

@kfox1111
Copy link

As someone that has clusters that are ~5 years old, certificate expiration/rotation is starting to feel more and more important. :)

@sbueringer
Copy link
Member

The webhook certificates we use in CAPI today use the cert-manager defaults. 90 days validity and renewal after 60 days.

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants