Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the MultiScopedCache #90

Merged
merged 2 commits into from
Jan 10, 2023
Merged

Conversation

inteon
Copy link
Member

@inteon inteon commented Jan 8, 2023

This PR remoes ClientDisableCacheFor and fixes the MultiScopedCache by populating GVK values when missing based on the type of the provided client.Object.

Currently, ClientDisableCacheFor disables caching for Secret resources; I reenabled caching for these resources. I also fixed the MultiScopedCache, now it can handle *corev1.Secret resources that don't have GVK values set.

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@jetstack-bot jetstack-bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 8, 2023
@inteon inteon mentioned this pull request Jan 9, 2023
Copy link
Member

@SgtCoDFish SgtCoDFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Nvm, apparently there are k8s reasons this is all moot, ignore.


I'd not really thought much about caching in trust-manager yet, so thank you for this thought provoking PR!

Looking at it, this makes sense and seems to be an improvement in the case that we want caching. But I'm not actually sure caching is worth it right now.

As far as I can see, the absolute worst case scenario in which our cache would help is in the case where we have n different Bundle resources, all of which refer to the same Secret.

That would mean that without a cache an update to the Secret would trigger n API calls, whereas a cache would reduce that to 1 API call in the best case.

That's an improvement of course but I see some issues:

  1. Given that Secrets probably aren't the best type of source for a bundle (since trusted CA data isn't a "secret") the question is 'what about ConfigMaps'. We don't have caching for ConfigMaps (I think?) and they're likely to be more common.
  2. Large values of n seem really unlikely to me. I can imagine a multi-tenanted situation where a single trusted CA is shared by many bundles. But for n to grow large enough that we cause performance problems seems like it would need an extremely large cluster - and I'd imagine that in such a large cluster, the kube API server would be pretty beefy and capable of handling a decent amount of load.
  3. If we were to add Secrets as targets in the future, having to work around a cache would introduce complications. Nothing we couldn't manage, but it'd be something we'd have to think about.

Caches are complicated things and introduce a lot of possibilities for bugs. I'm thinking it might be easier to just avoid all of those bugs by explicitly skipping caching entirely and only adding it later if someone actually comes along with a concrete example of a performance issue (in which case, we'd have to look at ConfigMaps too).

As a bonus, skipping caching will slightly reduce runtime memory usage and eliminate the possibility of us caching any Secrets we definitely don't want to cache, reducing our attack surface (slightly).

@JoshVanL
Copy link
Contributor

JoshVanL commented Jan 9, 2023

@SgtCoDFish it is not entirely clear to me what this PR is trying to fix however the reason why we have this "multi-namespace cache" is that we want to inform on Secrets for only a single Namespace (the trust Namespace). Other resources we want to inform at the cluster level; either because they are cluster scoped resources (Bundle), or because we read/write to more than just the trust Namespace (ConfigMaps for both source and target). controller-runtime doesn't give this functionality by default, hence the custom cache implementation which is mostly just a wrapper with a bit of logic to return the right underlying cache based on GVK. Without this, we would have to give RBAC to trust-manager to be able to read secrets in all namespaces (as well as having the overhead of metadata watch on Secrets in all Namespaces rather than just one).

Caches are complicated things and introduce a lot of possibilities for bugs. I'm thinking it might be easier to just avoid all of those bugs by explicitly skipping caching entirely and only adding it later if someone actually comes along with a concrete example of a performance issue (in which case, we'd have to look at ConfigMaps too).

We can't get away from using caches as they are a requirement for the informer. This should ideally be hidden away from us with the controller-runtime SDK, but at least at the time of writing this code, multi-scope namespacing per gvk wasn't possible.

@SgtCoDFish
Copy link
Member

Currently, ClientDisableCacheFor disables caching for Secret resources; I reenabled caching for these resources.

We can't get away from using caches as they are a requirement for the informer.

As far as I can tell, only one of these can be true.

@JoshVanL
Copy link
Contributor

JoshVanL commented Jan 9, 2023

@SgtCoDFish This is the point of spinning up our own Namespaced cache which is wrapped. If you disable the ClientDisableCacheFor which makes controller-runtime spin up cache at the cluster level then you will get the following errors.

diff --git a/cmd/app/app.go b/cmd/app/app.go
index 900de4e..7967d0a 100644
--- a/cmd/app/app.go
+++ b/cmd/app/app.go
@@ -61,12 +61,12 @@ func NewCommand() *cobra.Command {
                        eventBroadcaster.StartRecordingToSink(&clientv1.EventSinkImpl{Interface: cl.CoreV1().Events("")})

                        mgr, err := ctrl.NewManager(opts.RestConfig, ctrl.Options{
-                               Scheme:                        trustapi.GlobalScheme,
-                               EventBroadcaster:              eventBroadcaster,
-                               LeaderElection:                true,
-                               LeaderElectionNamespace:       opts.Bundle.Namespace,
-                               NewCache:                      bundle.NewCacheFunc(opts.Bundle),
-                               ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
+                               Scheme:                  trustapi.GlobalScheme,
+                               EventBroadcaster:        eventBroadcaster,
+                               LeaderElection:          true,
+                               LeaderElectionNamespace: opts.Bundle.Namespace,
+                               NewCache:                bundle.NewCacheFunc(opts.Bundle),
+                               //ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
                                LeaderElectionID:              "trust-manager-leader-election",
                                LeaderElectionReleaseOnCancel: true,
                                ReadinessEndpointName:         opts.ReadyzPath,
E0109 12:17:12.222330       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:cert-manager:trust-manager" cannot list resource "secrets" in API group "" at the cluster scope

@inteon
Copy link
Member Author

inteon commented Jan 9, 2023

@SgtCoDFish This is the point of spinning up our own Namespaced cache which is wrapped. If you disable the ClientDisableCacheFor which makes controller-runtime spin up cache at the cluster level then you will get the following errors.

diff --git a/cmd/app/app.go b/cmd/app/app.go
index 900de4e..7967d0a 100644
--- a/cmd/app/app.go
+++ b/cmd/app/app.go
@@ -61,12 +61,12 @@ func NewCommand() *cobra.Command {
                        eventBroadcaster.StartRecordingToSink(&clientv1.EventSinkImpl{Interface: cl.CoreV1().Events("")})

                        mgr, err := ctrl.NewManager(opts.RestConfig, ctrl.Options{
-                               Scheme:                        trustapi.GlobalScheme,
-                               EventBroadcaster:              eventBroadcaster,
-                               LeaderElection:                true,
-                               LeaderElectionNamespace:       opts.Bundle.Namespace,
-                               NewCache:                      bundle.NewCacheFunc(opts.Bundle),
-                               ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
+                               Scheme:                  trustapi.GlobalScheme,
+                               EventBroadcaster:        eventBroadcaster,
+                               LeaderElection:          true,
+                               LeaderElectionNamespace: opts.Bundle.Namespace,
+                               NewCache:                bundle.NewCacheFunc(opts.Bundle),
+                               //ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
                                LeaderElectionID:              "trust-manager-leader-election",
                                LeaderElectionReleaseOnCancel: true,
                                ReadinessEndpointName:         opts.ReadyzPath,
E0109 12:17:12.222330       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:cert-manager:trust-manager" cannot list resource "secrets" in API group "" at the cluster scope

@JoshVanL that is what this PR fixes, now controller-runtime does not try to spin up a cache at the cluster level, instead it uses the MultiScopedCache to limit the scope of the cache. It was not working before because the *corev1.Secret objects don't have a GVK value set, so MultiScopedCache was just passing the requests to the cluster scoped cache.

@JoshVanL
Copy link
Contributor

JoshVanL commented Jan 9, 2023

@inteon could you please add a test showing what is being fixed here? Still struggling to see what we are improving. Is it the fact that ClientDisableCacheFor is now nil?

@jetstack-bot jetstack-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 9, 2023
@inteon inteon force-pushed the fix_cache branch 2 times, most recently from b61ccfe to a57220e Compare January 9, 2023 15:33
@inteon
Copy link
Member Author

inteon commented Jan 9, 2023

/retest

@inteon
Copy link
Member Author

inteon commented Jan 9, 2023

@JoshVanL see a57220e for a test, this test should fail for main

pkg/bundle/multi_scoped_cache/cache.go Outdated Show resolved Hide resolved
pkg/bundle/multi_scoped_cache/cache.go Outdated Show resolved Hide resolved
pkg/bundle/multi_scoped_cache/cache.go Outdated Show resolved Hide resolved
test/integration/bundle/cache_test.go Outdated Show resolved Hide resolved
test/integration/bundle/cache_test.go Outdated Show resolved Hide resolved
test/integration/bundle/cache_test.go Outdated Show resolved Hide resolved
test/integration/bundle/cache_test.go Outdated Show resolved Hide resolved
test/integration/bundle/cache_test.go Outdated Show resolved Hide resolved
@inteon inteon force-pushed the fix_cache branch 2 times, most recently from 561de08 to b5d28ed Compare January 9, 2023 17:17
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@inteon
Copy link
Member Author

inteon commented Jan 9, 2023

/retest

@inteon inteon requested a review from JoshVanL January 9, 2023 17:52
@JoshVanL
Copy link
Contributor

JoshVanL commented Jan 9, 2023

/lgtm

@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 9, 2023
@inteon
Copy link
Member Author

inteon commented Jan 9, 2023

@JoshVanL can you approve the PR too?

@JoshVanL
Copy link
Contributor

/approve

@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inteon, JoshVanL

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2023
@jetstack-bot jetstack-bot merged commit f2da892 into cert-manager:main Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants