Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add support for CA/certificate rotation #1062

Merged
merged 4 commits into from
Jul 22, 2024

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Jul 17, 2024

Fixes #915
Mounted secrets are automatically updated into pods, but...

  • It doesn't work with subPath mountings
  • When subPath is not used, then a bunch of directories are mounted
  • And one of those directories is a symlink, so IsDir() returns false
  • And a watch is needed to notice the change

So, update the certificate volume patch, which requires a change in how we look for certificates in the CA cert directory.

Add a watch, so when the certs do change, we update the cert pool.
Also look at validity dates of certificates, and error on expired certs.

The default cert-manager certificates have 90 days validities.

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@tmshort tmshort requested a review from a team as a code owner July 17, 2024 21:46
Copy link

netlify bot commented Jul 17, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 5eea072
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/669ac31547b423000860d4e7
😎 Deploy Preview https://deploy-preview-1062--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tmshort
Copy link
Contributor Author

tmshort commented Jul 17, 2024

This also adds a unit-test for an empty cert dir, and for an expired certificate. And also adds an e2e to make sure secret updates and rotations actually work.

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 71.28713% with 29 lines in your changes missing coverage. Please review.

Project coverage is 72.67%. Comparing base (775613f) to head (5eea072).
Report is 2 commits behind head on main.

Files Patch % Lines
internal/httputil/certpoolwatcher.go 68.96% 11 Missing and 7 partials ⚠️
internal/httputil/certutil.go 78.26% 2 Missing and 3 partials ⚠️
internal/catalogmetadata/cache/cache.go 66.66% 1 Missing and 1 partial ⚠️
internal/httputil/httputil.go 60.00% 1 Missing and 1 partial ⚠️
internal/rukpak/source/image_registry.go 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1062      +/-   ##
==========================================
- Coverage   72.85%   72.67%   -0.19%     
==========================================
  Files          31       32       +1     
  Lines        1864     1965     +101     
==========================================
+ Hits         1358     1428      +70     
- Misses        371      388      +17     
- Partials      135      149      +14     
Flag Coverage Δ
e2e 55.65% <46.53%> (+0.13%) ⬆️
unit 45.54% <61.38%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tmshort tmshort force-pushed the cert-rotate branch 3 times, most recently from 4450298 to 03c4c85 Compare July 18, 2024 13:21
cmd/manager/main.go Outdated Show resolved Hide resolved
@tmshort tmshort force-pushed the cert-rotate branch 2 times, most recently from 95abebf to ab9e281 Compare July 18, 2024 16:44
Mounted secrets are automatically updated into pods, but...
* It doesn't work with `subPath` mountings
* When `subPath` is not used, then a bunch of directories are mounted
* And one of those directories is a symlink, so `IsDir()` returns false
* And a watch is needed to notice the change

So, update the certificate volume patch, which requires a change in how
we look for certificates in the CA cert directory.

Add a watch, so when the certs do change, we update the cert pool.

Also look at validity dates of certificates, and error on expired certs.

The default cert-manager certificates have 90 days validities.

Signed-off-by: Todd Short <tshort@redhat.com>
cmd/manager/main.go Outdated Show resolved Hide resolved
Copy link
Member

@m1kola m1kola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing.

For these interested, here are some relevant issues in Go and K8s on root CA reloading:

TLDR: it looks like there is no equivalent of GetClientCertificate/GetCertificate, but for root CAs in TLS config in standard library, but it is possible to implement a workaround with VerifyPeerCertificate. This would require setting InsecureSkipVerify to true and re-implementing the standard verification in VerifyPeerCertificate as far as I understand - see this example from Go docs.

Makefile Outdated Show resolved Hide resolved
hack/test/cert-e2e.sh Outdated Show resolved Hide resolved
internal/catalogmetadata/cache/cache.go Show resolved Hide resolved
hack/test/cert-e2e.sh Outdated Show resolved Hide resolved
hack/test/cert-e2e.sh Outdated Show resolved Hide resolved
@tmshort
Copy link
Contributor Author

tmshort commented Jul 19, 2024

TLDR: it looks like there is no equivalent of GetClientCertificate/GetCertificate, but for root CAs in TLS config in standard library, but it is possible to implement a workaround with VerifyPeerCertificate. This would require setting InsecureSkipVerify to true and re-implementing the standard verification in VerifyPeerCertificate as far as I understand - see this example from Go docs.

And we really don't want to do that (set InsecureSkipVerify) by default. And reimplementing the verification code is probably not worth it, when we can just offer up the updated certificate pool on demand.

Signed-off-by: Todd Short <tshort@redhat.com>
@m1kola
Copy link
Member

m1kola commented Jul 19, 2024

And we really don't want to do that (set InsecureSkipVerify) by default. And reimplementing the verification code is probably not worth it, when we can just offer up the updated certificate pool on demand.

Setting InsecureSkipVerify to true and VerifyPeerCertificate means that we take over the responsibility of verifying the certs from standard library. That's not great, but it is not a lot of code (see standard library code, and client example from the docs). It doesn't sounds too terrible to me, so I'm open to consider this.

As far as I understand in this implementation we are re-creating a client each time and will have to establish a connection each we use it which is not optimal.

I don't know how significantly it will affect our performance. One one hand - creating new connections every time is not great, but on the other hand - not sure if we are going to benefit from re-using the connection in these use cases.

Signed-off-by: Todd Short <tshort@redhat.com>
for {
select {
case <-watcher.Events:
cpw.drainEvents()
Copy link
Contributor

@everettraven everettraven Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if any events include only the directory being watched as the event.Name value? If so, I wonder if instead of performing this drainEvents action we could do some event filtering similar to https://github.com/fsnotify/fsnotify/blob/c1467c02fba575afdb5f4201072ab8403bbf00f4/cmd/fsnotify/file.go#L66-L78

I won't block the PR merging on this, but something that could make it so we don't have any "sleep" actions if it is possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some filtering might be useful. The only time this path should be updated is when a Secret is updated. The directory is read-only within the pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the filter will not work if new files are added, as they will be filtered out. We need to recognize new files, deleted files, updated files, etc.

Copy link
Contributor

@everettraven everettraven Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to recognize new files, deleted files, updated files, etc.

If we only react to "directory has been updated" type events wouldn't we catch these events as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it wouldn't catch updates to files within, as that's a change to the file, not the directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the reasoning for the drain events operation was because when we receive updates we get mass events on everything when something changed. Maybe I misunderstood, which led me to thinking that if any change happened in the directory (including an individual file), it would trigger an event for the directory as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Watch is on the directory, and that includes the contents. It also depends on how things are mounted. A change to a file within a directory does not necessarily indicate a change to the directory.
The drain is there because the update of a single secret may trigger a number of events (I was seeing 4+, because of how the mounted files were presented), and only one reload of the certs is necessary.
Based on my testing, if there's an update on a file, it only reports the update on that file (i.e. create/update); it doesn't trigger a second update on the directory as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added debug output to the cert watcher test for every event in the unit test:

=== RUN   TestCertPoolWatcher
    certpoolwatcher_test.go:72: Create cert file at "/tmp/cert-pool4285276756/test1.pem"
    certpoolwatcher_test.go:87: Create cert file at "/tmp/cert-pool4285276756/test2.pem"
Event: CREATE        "/tmp/cert-pool4285276756/test2.pem"
Event: WRITE         "/tmp/cert-pool4285276756/test2.pem"
--- PASS: TestCertPoolWatcher (1.11s)

So, there's an event for the create of the new PEM, and one for the write, but nothing on the directory itself. There are two events, and that would cause two reloads without the drain mechanism in place.

@tmshort tmshort added this pull request to the merge queue Jul 22, 2024
Merged via the queue into operator-framework:main with commit e3e6b03 Jul 22, 2024
17 of 19 checks passed
@tmshort tmshort deleted the cert-rotate branch July 22, 2024 17:53
perdasilva pushed a commit to LalatenduMohanty/operator-controller that referenced this pull request Aug 13, 2024
* Add support for CA/certificate rotation

Mounted secrets are automatically updated into pods, but...
* It doesn't work with `subPath` mountings
* When `subPath` is not used, then a bunch of directories are mounted
* And one of those directories is a symlink, so `IsDir()` returns false
* And a watch is needed to notice the change

So, update the certificate volume patch, which requires a change in how
we look for certificates in the CA cert directory.

Add a watch, so when the certs do change, we update the cert pool.

Also look at validity dates of certificates, and error on expired certs.

The default cert-manager certificates have 90 days validities.

Signed-off-by: Todd Short <tshort@redhat.com>

* fixup! Add support for CA/certificate rotation

* fixup! Add support for CA/certificate rotation

Signed-off-by: Todd Short <tshort@redhat.com>

* fixup! Add support for CA/certificate rotation

Signed-off-by: Todd Short <tshort@redhat.com>

---------

Signed-off-by: Todd Short <tshort@redhat.com>
perdasilva pushed a commit to kevinrizza/operator-controller that referenced this pull request Aug 13, 2024
* Add support for CA/certificate rotation

Mounted secrets are automatically updated into pods, but...
* It doesn't work with `subPath` mountings
* When `subPath` is not used, then a bunch of directories are mounted
* And one of those directories is a symlink, so `IsDir()` returns false
* And a watch is needed to notice the change

So, update the certificate volume patch, which requires a change in how
we look for certificates in the CA cert directory.

Add a watch, so when the certs do change, we update the cert pool.

Also look at validity dates of certificates, and error on expired certs.

The default cert-manager certificates have 90 days validities.

Signed-off-by: Todd Short <tshort@redhat.com>

* fixup! Add support for CA/certificate rotation

* fixup! Add support for CA/certificate rotation

Signed-off-by: Todd Short <tshort@redhat.com>

* fixup! Add support for CA/certificate rotation

Signed-off-by: Todd Short <tshort@redhat.com>

---------

Signed-off-by: Todd Short <tshort@redhat.com>
@skattoju skattoju mentioned this pull request Sep 25, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle CA rotation for Catalogd web server trust
4 participants