Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update webhook for kubebuilder v2 #1769

Closed
anyasabo opened this issue Sep 20, 2019 · 6 comments · Fixed by #2072
Closed

Update webhook for kubebuilder v2 #1769

anyasabo opened this issue Sep 20, 2019 · 6 comments · Fixed by #2072
Assignees

Comments

@anyasabo
Copy link
Contributor

anyasabo commented Sep 20, 2019

Parent issue #1188

As part of moving to kubebuilder v2 in #1723, the validating webhook was removed. Kubebuilder v2 changes how webhooks are built and deployed. We should update the webhook to work with kubebuilder v2.

Removing the webhook validation also disabled the validation in the trial license in pkg/controller/license/trial/trial_controller.go, which should also be re-enabled as part of this issue or have a new issue created for it.

@anyasabo
Copy link
Contributor Author

Want to discuss options for deployment. Previously, at runtime the operator would deploy the webhook configuration and generate the required certs. Kubebuilder v2 does not do this. See here: https://kubebuilder.io/cronjob-tutorial/running-webhook.html

Options

1. Hard dependency on cert-manager

This is the recommended path from kubebuilder. In this case, we would

  • add an additional yaml file for the certificate resource so cert-manager provisions the a certificate for the operator
  • provide a webhook configuration annotated so that cert-manager injects the CA for said certificate

I think we also need to provide the current, webhookless yaml. Some people may think it is too much trouble to install for too little benefit. We already have two, so this means we produce four (potentially three if we decide to tell people they need to be on k8s 1.13+ for webhooks).

Pros

  • Secure, and the recommended path
  • Cert-manager seems to be relatively common in more mature clusters
  • Not much additional runtime code on our side

Cons

  • Testing the operator is harder. We would need to install cert-manager as part of our e2e tests

  • Cert-manager is only able to be installed cluster wide. We would need to decide what versions of cert-manager to support. We could not for example, request that people install a version of cert-manager just for ECK.

  • Cert-manager is still not stable (currently v0.11). The changes between versions right now can be significant, for example removing the v1alpha1 CRD:
    https://docs.cert-manager.io/en/latest/tasks/upgrading/upgrading-0.10-0.11.html

  • Deployment if people are not already using cert-manager is more trouble.

  • Troubleshooting user issues becomes harder as their environment becomes more complex. Before we started failing the webhook open, the webhook was a significant source of user issues. We can still fail it open, but I bring it up just to point out that the webhook deployment in general was a significant source of issues.

2. Re-implement the webhook installer

We could re-implement the removed kubebuilder functionality that automatically installs the webhook and generates certificates.

Pros

  • Operator deployment is simpler than cert-manager
  • Cert-manager not required for testing either

Cons

  • We need to actually implement webhook creates and updates.
  • Tests are also challenging, but in a different way. Instead of needing to install cert-manager for e2e tests, we need to write tests for our certificate/webhook installer code
  • We also need to handle cert generation/rotation for our webhook.
  • Deleting the operator is harder. Users cannot just delete the deployment yaml and have it deleted since it was not created by the deployment yaml
  • Our operator needs to have more permissions to manage the webhook configuration and the service

3. Do not implement webhook validation

We could decide that webhooks do not provide enough value to be worth the trouble.
These are the current webhook only validations:

	masterRequiredMsg        = "Elasticsearch needs to have at least one master node"
	parseVersionErrMsg       = "Cannot parse Elasticsearch version"
	parseStoredVersionErrMsg = "Cannot parse current Elasticsearch version"
	invalidSanIPErrMsg       = "invalid SAN IP address"
	pvcImmutableMsg          = "Volume claim templates cannot be modified"
	invalidNamesErrMsg       = "Elasticsearch configuration would generate resources with invalid names"
	unsupportedVersionErrMsg = "Unsupported version"
	blacklistedConfigErrMsg  = "Configuration setting is not user-configurable"

Recommendation

Option 1. We provide all-in-one yamls that include webhooks and cert-manager certificates. We continue to provide the existing yaml. We also update the documentation to explain

  • how to install cert-manager
  • why it is useful
  • what validations are provided and why they are useful
  • how to modify our yamls to use your own CA and certificate

We may also want to provide instructions / a make target for generating your own CA and certificate. I'm not sold on that yet.

I think re-implementing the webhook auto-installer is a bit hacky. Cert-manager is already much more tested for managing certificates. We would likely not do as good of a job, but it would still take up a lot of our time. If we do decide to implement it, it might be helpful to maintain it as a separate package so other operators can use it easily. I would not be in favor of option 2. It seems like a lot of effort for not much reward.

I think the validations we provide in the webhook are useful enough to be worth the effort of maintaining and deploying them, so would not be in favor of option 3.

What do you all think?

@sebgl
Copy link
Contributor

sebgl commented Oct 21, 2019

That's a great summary @anyasabo 👍

I think I would still be in favour of option 2 though (implementing stuff ourselves). My arguments:

  • Requiring a strict version of cert-manager installed is quite hard to do. What if the user already has cert-manager installed, but in an old version which does not generate certs the way we want? What if the user has cert-manager installed in a very recent version with the same problem? Since it's a cluster-wide component, they can only install a single version.
  • What if the user only has RBAC permissions to a single namespace? They cannot install cert-manager hence cannot enable ECK validation.
  • We already have quite some code dealing with certificates and certificates rotation. Handling the webhook one may not be that much extra work?

Deleting the operator is harder. Users cannot just delete the deployment yaml and have it deleted since it was not created by the deployment yaml
Our operator needs to have more permissions to manage the webhook configuration and the service

Can we ship the webhook configuration as yaml in our operator manifest? So the operator itself does not need to create/update the webhook config, and that resource will be deleted if the user uninstall the operator through its manifest. The only responsibility of the operator would be to create/update the certificates secret.

@pebrc
Copy link
Collaborator

pebrc commented Oct 21, 2019

I can only second what @sebgl said: great summary of the options we have.

I think we kind of want to do both?

  • For users already adopting cert-manager we want to allow them to use it for ECK as well.

    • we can achieve that by either documenting how to create a Certificate resourcce for the operator (or shipping a sample)
    • there is probably no harm in annotating the webhook resource with the cert-manager annotations
    • we need a configuration option to turn off our own certificate generation code
  • For users not using cert-manager:

    • we should generate the certificates (we can as @sebgl suggested ship webhook + an empty secret as manifests to facilitate deinstallation) and inject them ourselves with the necessary certificates at ECK startup
    • I assume we can leverage some of the certificate handling code we already have in place for that (CA generation, cert rotation etc)
  • I would default to ECK generating these certificates and leave the off-switch for advanced users that insist on using cert-manager.

@anyasabo
Copy link
Contributor Author

Requiring a strict version of cert-manager installed is quite hard to do.

I was thinking that since the webhooks are more of a "nice to have" feature that wide compatibility is not as necessary. Also, the actual configuration of the certificates to utilize cert-manager is not complex, see the example here. So changing it for a specific version should not be too challenging.

What if the user only has RBAC permissions to a single namespace? They cannot install cert-manager hence cannot enable ECK validation.

They would likely have a similar problem with the validating webhook configuration but I take your point.

Can we ship the webhook configuration as yaml in our operator manifest?

This is clever, I'm for it. There will be a disruption while the operator bootstraps itself but since it only applies to Elastic resources that seems acceptable.

Note that as Peter mentioned we would need to ship an empty secret (that it later fills in with its certificate), and the operator would need permissions to update the webhook configuration since it contains the CA.

I'm good with the plan as described by @pebrc . Unless there's different suggestions I may try to merge in the validation refactor first since it is a pain to keep up to date, then merge in the automatic webhook cert generation later on.

@anyasabo
Copy link
Contributor Author

anyasabo commented Nov 1, 2019

Unassigning myself since I will be out for a while. I think an intermediate step might before working on the automatic certificate creation might be to migrate the operator to a deployment from kubebuilder v2 instead of an sset. That way we can take advantage of the kubebuilder kustomize examples rather than needing to roll our own, and we already talked about wanting to do it anyway.

We might end up wanting to provide webhook and non-webhook yamls since in the past they caused issues for people with restrictive networks, and kustomize would help make that easier. Definitely not a requirement but just something I was considering.

@barkbay
Copy link
Contributor

barkbay commented Dec 13, 2019

Documentation has been added in #2158.

@barkbay barkbay closed this as completed Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants