Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Certificate support for image registry #960

Merged
merged 7 commits into from
Jun 24, 2024

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Jun 20, 2024

Fixes: #921

Remove the InsecureSkipTLSVerify annotations.

  • Create a ClusterIssuer CA (via cert-manager) that is used by OLMv1 e2e
  • Update the operator controller to specify a cert directory, rather than a single file.
  • Use this directory for catalogd and image-registries
  • Update the deployment to reference CAs appropriately

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@tmshort tmshort requested a review from a team as a code owner June 20, 2024 17:23
Copy link

netlify bot commented Jun 20, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 3ca71ba
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/6679b799478a640008286750
😎 Deploy Preview https://deploy-preview-960--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Remove the InsecureSkipTLSVerify annotations.

* Create a ClusterIssuer CA (via openssl) that is used by OLMv1 e2e
* Update the operator controller to specify a cert directory, rather than a
  single file.
* Use this directory for catalogd and image-registries
* Update the deployment to reference CAs appropriately

Signed-off-by: Todd Short <tshort@redhat.com>
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 70.00000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 79.28%. Comparing base (58b4363) to head (3ca71ba).
Report is 4 commits behind head on main.

Files Patch % Lines
internal/httputil/httputil.go 66.66% 5 Missing and 5 partials ⚠️
...nternal/controllers/clusterextension_controller.go 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #960      +/-   ##
==========================================
- Coverage   80.14%   79.28%   -0.86%     
==========================================
  Files          16       16              
  Lines        1103     1120      +17     
==========================================
+ Hits          884      888       +4     
- Misses        152      159       +7     
- Partials       67       73       +6     
Flag Coverage Δ
e2e 57.14% <65.00%> (-1.07%) ⬇️
unit 53.21% <20.00%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tmshort
Copy link
Contributor Author

tmshort commented Jun 21, 2024

To consolidate catalogd's certificate would require modifying catalogd; it would also allow catalogd to be placed back into its own namespace.

config/overlays/e2e/kustomization.yaml Outdated Show resolved Hide resolved
internal/controllers/clusterextension_controller.go Outdated Show resolved Hide resolved
config/overlays/certs/kustomization.yaml Outdated Show resolved Hide resolved
cmd/manager/main.go Outdated Show resolved Hide resolved
Signed-off-by: Todd Short <tshort@redhat.com>
Signed-off-by: Todd Short <tshort@redhat.com>
@tmshort
Copy link
Contributor Author

tmshort commented Jun 21, 2024

I can't get kustomize to use a different directory, so I'm applying the yaml for the certificates directly from a file, which also means I can apply them for Tilt from the same source.

RootCAs: caCertPool,
MinVersion: tls.VersionTLS12,
if info.IsDir() {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to recurse through all subdirectories? I'm tempted to suggest that we stick to finding files in a single named directory without recursing, but I'm curious if you have a specific reason.

To avoid recursing, we can return filepath.SkipDir here.

Suggested change
return nil
return filepath.SkipDir

Copy link
Contributor Author

@tmshort tmshort Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's intended to skip directories. I didn't see this in any examples.

Copy link
Contributor Author

@tmshort tmshort Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to skip "." as a directory, meaning, it skips the current directory... so the e2es fail.
(Or something equally nefarious.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, found one example, and the example has a comparison with a directory name, so we'd need to do that to properly use filepathSkipDir; not sure it's worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe switch over to os.ReadDir(), which makes the intent more clear anyways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will look


if caCert != "" {
// tlsFileWatcher, err := certwatcher.New(caCert, "")
func LoadCerts(caDir string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be a nicer abstraction if this was:

Suggested change
func LoadCerts(caDir string) (string, error) {
func AddCertificatesFromDirectory(certPool *x509.CertPool, caDir string) error { ... }

In which case, we could add each file's content, collect errors reading/adding each file, and then tell callers which specific files failed for which reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not compatible with the current rukpak API, which expects a string of PEM. Maybe when rukpak is fully integrated into operator-controller? I agree that this would be better for a larger directory. There does not appear to be a way to extract certificates/PEM from a CertPool, so it can't be done with this signature, yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was just looking at LoadCerts being called in BuildHTTPClient, which turns around and consumes the certs string by immediately adding to the caCertPool.

And that made me think we could pass the caCertPool into this function and call AppendCertsFromPEM for each file's contents. Then, when my suggested function signature returns, you end up with caCertPool having everything in caDir added to it.

Seems like I'm missing something though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also used when creating a BundleDeployment for a rukpak API.

bd := r.generateBundleDeploymentForUnpack(bundle.Image, ext)

So, we need to change the rukpak API to accept a caPool first, then we can do as you suggest.

Comment on lines 38 to 39
kubectl apply -f testdata/certs/issuers.yaml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work when we publish this install script in our release.

Copy link
Contributor Author

@tmshort tmshort Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried numerous ways to get the secret for the ClusterIssuser into the cert-manager namespace, but despite trying various transformations:, I couldn't get kustomize to do it (it wanted to put stuff into the olmv1-system directory. Using a separate file at least avoids duplication among the two e2e tests. But I can always do it via a HEREDOC.

Copy link
Contributor Author

@tmshort tmshort Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. if you can figure out how to make kustomize work the way we want, I'd much prefer to do it that way!

Signed-off-by: Todd Short <tshort@redhat.com>
Comment on lines +21 to +22
- path: patches/manager_cert_patch.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manager_deployment_cert.yaml and manager_cert_patch.yaml seem to patch the same deployment and same fields but in different ways. I think it would be good to standardise.

Comment on lines +13 to +16
issuerRef:
name: olmv1-ca
kind: ClusterIssuer
group: cert-manager.io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a dependency on ClusterIssuer which is only being created In install.tpl.sh/in Tilt.

This means it is no longer possible to do this to install:

kubectl apply -f https://github.com/operator-framework/catalogd/releases/download/$CATALOGD_VERSION/catalogd.yaml
kubectl apply -f https://github.com/operator-framework/operator-controller/releases/download/$VERSION/operator-controller.yaml

You have to checkout the repo run the install script.

Are we ok with that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also about related changes in scripts/install.tpl.sh where we create CA, etc.

As far as I understand - we need to create CA and patch deployment of the manger only for E2E setups. If it is the case - can we put this in config/overlays/e2e/?

I would like to avoid having to create all this on every cluster just because our E2E setup requires it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need cert-manager regardless, AFAICT, as catalogd now creates certificates. So, the method listed above no longer works (or at least it shouldn't).
I tried to put this into the e2e overlay, but could not get the namespace to work as expected. Please see #960 (comment); I have been unable to figure out how to apply resources into the cert-manager namespace. i.e. kustomize seems to be locking me into a single namespace...

cmd/manager/main.go Outdated Show resolved Hide resolved
@tmshort
Copy link
Contributor Author

tmshort commented Jun 24, 2024

TL;DR: The challenge I've run into w/ adding the certificates is being able to put them into the cert-manager namespace for the ClusterIssuer. I tried various kustomize options and transformations and none of them seemed to work. I would love to get rid of the HEREDOC and yaml used to create the certificates. If anyone has any ideas, I'm all ears.

@m1kola
Copy link
Member

m1kola commented Jun 24, 2024

A number of my comments in this PR were based on the understanding that this is only about making our E2E setup work with TLS. After chatting with @joelanford and @tmshort I got more context.

The primary objective of the PR to get rid of InsecureSkipTLSVerify and use TLS in E2E, but it also prepares the ground for using single ClusterIssuer for all OLMv1 components. E.g. we can use it for catalogd too.

Explanation from @joelanford:

  1. create a single ClusterIssuer in the standard install
  2. any OLMv1 component that needs a certificate gets it from there, no matter which namespace that component is in.
  3. any OLMv1 component that needs a CA can create a certificate (if it doesn't already have one), and that cert secret will contain the CA.

That setup would mean that we could move catalogd back to its own namespace if we wanted to, and it would mean that the e2e registry would be automatically trusted by the standard setup because operator-controller would already have the CA.

@tmshort
Copy link
Contributor Author

tmshort commented Jun 24, 2024

Three follow-ons to this PR:

  1. Update the config/kustomization of operator-controller to create the ClusterIssuer and required resources in the cert-manager namespace (rather than via HEREDOC and separate YAML), for use by operator-controller, and eventually catalogd. The challenge here is the multiple namespaces. This may require an RFC or other documentation.
  2. Unifying catalogd and operator-controller's use of a common ClusterIssuer for certificates.
    2a. Possibly move catalogd back into it's own namespace, especially if the olmv1-system namespace security parameters are overwritten
  3. Update rukpak to accept a CertPool in the API, with corresponding change to operator-controller.

@everettraven
Copy link
Contributor

Follow ups should be captured by this epic: #967

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes themselves look fine to me, but I noticed the e2es are failing. Any insights as to why?

@tmshort
Copy link
Contributor Author

tmshort commented Jun 24, 2024

The changes themselves look fine to me, but I noticed the e2es are failing. Any insights as to why?

I made a change that passed due to caching issues... I had to fix it up and it should be ok now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that this is duplicated here and in the HEREDOC, but I also recognize that this setup is temporary. Can we just make sure that our follow-ups include a callout to remove this duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmshort tmshort added this pull request to the merge queue Jun 24, 2024
Merged via the queue into operator-framework:main with commit d510ad1 Jun 24, 2024
15 of 17 checks passed
@tmshort tmshort deleted the cert-update2 branch June 24, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove bundle.connection.config/insecureSkipTLSVerify annotation
4 participants