Fix the MultiScopedCache #90

inteon · 2023-01-08T22:34:00Z

This PR remoes ClientDisableCacheFor and fixes the MultiScopedCache by populating GVK values when missing based on the type of the provided client.Object.

Currently, ClientDisableCacheFor disables caching for Secret resources; I reenabled caching for these resources. I also fixed the MultiScopedCache, now it can handle *corev1.Secret resources that don't have GVK values set.

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>

SgtCoDFish

EDIT: Nvm, apparently there are k8s reasons this is all moot, ignore.

I'd not really thought much about caching in trust-manager yet, so thank you for this thought provoking PR!

Looking at it, this makes sense and seems to be an improvement in the case that we want caching. But I'm not actually sure caching is worth it right now.

As far as I can see, the absolute worst case scenario in which our cache would help is in the case where we have n different Bundle resources, all of which refer to the same Secret.

That would mean that without a cache an update to the Secret would trigger n API calls, whereas a cache would reduce that to 1 API call in the best case.

That's an improvement of course but I see some issues:

Given that Secrets probably aren't the best type of source for a bundle (since trusted CA data isn't a "secret") the question is 'what about ConfigMaps'. We don't have caching for ConfigMaps (I think?) and they're likely to be more common.
Large values of n seem really unlikely to me. I can imagine a multi-tenanted situation where a single trusted CA is shared by many bundles. But for n to grow large enough that we cause performance problems seems like it would need an extremely large cluster - and I'd imagine that in such a large cluster, the kube API server would be pretty beefy and capable of handling a decent amount of load.
If we were to add Secrets as targets in the future, having to work around a cache would introduce complications. Nothing we couldn't manage, but it'd be something we'd have to think about.

Caches are complicated things and introduce a lot of possibilities for bugs. I'm thinking it might be easier to just avoid all of those bugs by explicitly skipping caching entirely and only adding it later if someone actually comes along with a concrete example of a performance issue (in which case, we'd have to look at ConfigMaps too).

As a bonus, skipping caching will slightly reduce runtime memory usage and eliminate the possibility of us caching any Secrets we definitely don't want to cache, reducing our attack surface (slightly).

JoshVanL · 2023-01-09T11:14:12Z

@SgtCoDFish it is not entirely clear to me what this PR is trying to fix however the reason why we have this "multi-namespace cache" is that we want to inform on Secrets for only a single Namespace (the trust Namespace). Other resources we want to inform at the cluster level; either because they are cluster scoped resources (Bundle), or because we read/write to more than just the trust Namespace (ConfigMaps for both source and target). controller-runtime doesn't give this functionality by default, hence the custom cache implementation which is mostly just a wrapper with a bit of logic to return the right underlying cache based on GVK. Without this, we would have to give RBAC to trust-manager to be able to read secrets in all namespaces (as well as having the overhead of metadata watch on Secrets in all Namespaces rather than just one).

Caches are complicated things and introduce a lot of possibilities for bugs. I'm thinking it might be easier to just avoid all of those bugs by explicitly skipping caching entirely and only adding it later if someone actually comes along with a concrete example of a performance issue (in which case, we'd have to look at ConfigMaps too).

We can't get away from using caches as they are a requirement for the informer. This should ideally be hidden away from us with the controller-runtime SDK, but at least at the time of writing this code, multi-scope namespacing per gvk wasn't possible.

SgtCoDFish · 2023-01-09T11:51:40Z

Currently, ClientDisableCacheFor disables caching for Secret resources; I reenabled caching for these resources.

We can't get away from using caches as they are a requirement for the informer.

As far as I can tell, only one of these can be true.

JoshVanL · 2023-01-09T12:20:37Z

@SgtCoDFish This is the point of spinning up our own Namespaced cache which is wrapped. If you disable the ClientDisableCacheFor which makes controller-runtime spin up cache at the cluster level then you will get the following errors.

diff --git a/cmd/app/app.go b/cmd/app/app.go
index 900de4e..7967d0a 100644
--- a/cmd/app/app.go
+++ b/cmd/app/app.go
@@ -61,12 +61,12 @@ func NewCommand() *cobra.Command {
                        eventBroadcaster.StartRecordingToSink(&clientv1.EventSinkImpl{Interface: cl.CoreV1().Events("")})

                        mgr, err := ctrl.NewManager(opts.RestConfig, ctrl.Options{
-                               Scheme:                        trustapi.GlobalScheme,
-                               EventBroadcaster:              eventBroadcaster,
-                               LeaderElection:                true,
-                               LeaderElectionNamespace:       opts.Bundle.Namespace,
-                               NewCache:                      bundle.NewCacheFunc(opts.Bundle),
-                               ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
+                               Scheme:                  trustapi.GlobalScheme,
+                               EventBroadcaster:        eventBroadcaster,
+                               LeaderElection:          true,
+                               LeaderElectionNamespace: opts.Bundle.Namespace,
+                               NewCache:                bundle.NewCacheFunc(opts.Bundle),
+                               //ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
                                LeaderElectionID:              "trust-manager-leader-election",
                                LeaderElectionReleaseOnCancel: true,
                                ReadinessEndpointName:         opts.ReadyzPath,

E0109 12:17:12.222330       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:cert-manager:trust-manager" cannot list resource "secrets" in API group "" at the cluster scope

inteon · 2023-01-09T13:38:58Z

@SgtCoDFish This is the point of spinning up our own Namespaced cache which is wrapped. If you disable the ClientDisableCacheFor which makes controller-runtime spin up cache at the cluster level then you will get the following errors.

diff --git a/cmd/app/app.go b/cmd/app/app.go
index 900de4e..7967d0a 100644
--- a/cmd/app/app.go
+++ b/cmd/app/app.go
@@ -61,12 +61,12 @@ func NewCommand() *cobra.Command {
                        eventBroadcaster.StartRecordingToSink(&clientv1.EventSinkImpl{Interface: cl.CoreV1().Events("")})

                        mgr, err := ctrl.NewManager(opts.RestConfig, ctrl.Options{
-                               Scheme:                        trustapi.GlobalScheme,
-                               EventBroadcaster:              eventBroadcaster,
-                               LeaderElection:                true,
-                               LeaderElectionNamespace:       opts.Bundle.Namespace,
-                               NewCache:                      bundle.NewCacheFunc(opts.Bundle),
-                               ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
+                               Scheme:                  trustapi.GlobalScheme,
+                               EventBroadcaster:        eventBroadcaster,
+                               LeaderElection:          true,
+                               LeaderElectionNamespace: opts.Bundle.Namespace,
+                               NewCache:                bundle.NewCacheFunc(opts.Bundle),
+                               //ClientDisableCacheFor:         bundle.ClientDisableCacheFor(),
                                LeaderElectionID:              "trust-manager-leader-election",
                                LeaderElectionReleaseOnCancel: true,
                                ReadinessEndpointName:         opts.ReadyzPath,

E0109 12:17:12.222330       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:cert-manager:trust-manager" cannot list resource "secrets" in API group "" at the cluster scope

@JoshVanL that is what this PR fixes, now controller-runtime does not try to spin up a cache at the cluster level, instead it uses the MultiScopedCache to limit the scope of the cache. It was not working before because the *corev1.Secret objects don't have a GVK value set, so MultiScopedCache was just passing the requests to the cluster scoped cache.

JoshVanL · 2023-01-09T13:49:06Z

@inteon could you please add a test showing what is being fixed here? Still struggling to see what we are improving. Is it the fact that ClientDisableCacheFor is now nil?

inteon · 2023-01-09T15:38:35Z

/retest

inteon · 2023-01-09T15:39:21Z

@JoshVanL see a57220e for a test, this test should fail for main

pkg/bundle/multi_scoped_cache/cache.go

test/integration/bundle/cache_test.go

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>

inteon · 2023-01-09T17:36:22Z

/retest

JoshVanL · 2023-01-09T18:08:56Z

/lgtm

inteon · 2023-01-09T18:14:15Z

@JoshVanL can you approve the PR too?

JoshVanL · 2023-01-10T09:51:28Z

/approve

jetstack-bot · 2023-01-10T09:51:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inteon, JoshVanL

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [JoshVanL]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fix MultiScopedCache

1a8bba6

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>

jetstack-bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 8, 2023

inteon mentioned this pull request Jan 9, 2023

Use SSA #89

Merged

SgtCoDFish reviewed Jan 9, 2023

View reviewed changes

jetstack-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 9, 2023

inteon force-pushed the fix_cache branch 2 times, most recently from b61ccfe to a57220e Compare January 9, 2023 15:33

JoshVanL requested changes Jan 9, 2023

View reviewed changes

inteon force-pushed the fix_cache branch 2 times, most recently from 561de08 to b5d28ed Compare January 9, 2023 17:17

add test

16d3212

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>

inteon force-pushed the fix_cache branch from b5d28ed to 16d3212 Compare January 9, 2023 17:21

inteon requested a review from JoshVanL January 9, 2023 17:52

jetstack-bot assigned JoshVanL Jan 9, 2023

jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 9, 2023

jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2023

jetstack-bot merged commit f2da892 into cert-manager:main Jan 10, 2023

SgtCoDFish mentioned this pull request Jan 10, 2023

Cache correctness investigations #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the MultiScopedCache #90

Fix the MultiScopedCache #90

inteon commented Jan 8, 2023 •

edited

Loading

SgtCoDFish left a comment •

edited

Loading

JoshVanL commented Jan 9, 2023

SgtCoDFish commented Jan 9, 2023

JoshVanL commented Jan 9, 2023

inteon commented Jan 9, 2023 •

edited

Loading

JoshVanL commented Jan 9, 2023

inteon commented Jan 9, 2023

inteon commented Jan 9, 2023

inteon commented Jan 9, 2023

JoshVanL commented Jan 9, 2023

inteon commented Jan 9, 2023

JoshVanL commented Jan 10, 2023

jetstack-bot commented Jan 10, 2023

Fix the MultiScopedCache #90

Fix the MultiScopedCache #90

Conversation

inteon commented Jan 8, 2023 • edited Loading

SgtCoDFish left a comment • edited Loading

Choose a reason for hiding this comment

JoshVanL commented Jan 9, 2023

SgtCoDFish commented Jan 9, 2023

JoshVanL commented Jan 9, 2023

inteon commented Jan 9, 2023 • edited Loading

JoshVanL commented Jan 9, 2023

inteon commented Jan 9, 2023

inteon commented Jan 9, 2023

inteon commented Jan 9, 2023

JoshVanL commented Jan 9, 2023

inteon commented Jan 9, 2023

JoshVanL commented Jan 10, 2023

jetstack-bot commented Jan 10, 2023

inteon commented Jan 8, 2023 •

edited

Loading

SgtCoDFish left a comment •

edited

Loading

inteon commented Jan 9, 2023 •

edited

Loading