Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-cached client #92

Merged
merged 2 commits into from
Jan 13, 2023
Merged

Fix non-cached client #92

merged 2 commits into from
Jan 13, 2023

Conversation

inteon
Copy link
Member

@inteon inteon commented Jan 11, 2023

Here, we incorrectly assume that the client is a non-cached client:

// client is a Kubernetes client that makes calls to the API for every request.
// Should be used for updating, deleting, and when requesting data from
// resources whose informer only caches metadata.
client client.Client

This results in caching all these resources fully. Additionally, we are already caching the metadata-only representation of these resources, effectively caching each resource twice (more info: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/builder/options.go#L103-L132).

In this PR, I fix this issue and also rename the client field to directClient to make clear that this client is not cached.
Also, I simplified the cache.go code a bit by using an external library.

fixes #91

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@jetstack-bot jetstack-bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2023
})
if err != nil {
return fmt.Errorf("failed to create client: %w", err)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I create the direct client instead of using mgr.Client() which is a cached client.

@SgtCoDFish SgtCoDFish added this to the v0.4.0 milestone Jan 11, 2023
Copy link
Contributor

@irbekrm irbekrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @inteon this makes sense to me.

I can reproduce the issue (with increased log level), when trust manager is run from current main branch, and some resources including a ConfigMap source is deployed, I can see that a watch is started:

...
I0111 18:49:09.429622       1 controller.go:220] trust/manager/controller/bundle "msg"="Starting workers" "reconciler group"="trust.cert-manager.io" "reconciler kind"="Bundle" "worker count"=1
I0111 18:49:09.429687       1 bundle.go:86] trust/bundle "msg"="syncing bundle" "bundle"="my-org.com" 
I0111 18:49:09.430027       1 reflector.go:219] Starting reflector *v1.ConfigMap (9h13m35.601236097s) from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:49:09.430055       1 reflector.go:255] Listing and watching *v1.ConfigMap from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:49:09.530536       1 shared_informer.go:270] caches populated
I0111 18:49:09.530779       1 bundle.go:86] trust/bundle "msg"="syncing bundle" "bundle"="my-org.com" 
...

whereas from this branch only metadata, bundle and namespace watches are started which matches what I would expect looking at which resources are specified to be cached in full and for which metadata only should be cached:

...
I0111 18:54:00.811795       1 reflector.go:219] Starting reflector *v1.PartialObjectMetadata (9h38m36.286332225s) from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.811804       1 reflector.go:255] Listing and watching *v1.PartialObjectMetadata from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.811807       1 reflector.go:219] Starting reflector *v1.Namespace (10h14m43.088302663s) from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.811814       1 reflector.go:255] Listing and watching *v1.Namespace from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.811868       1 reflector.go:219] Starting reflector *v1alpha1.Bundle (9h11m15.662603274s) from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.811874       1 reflector.go:255] Listing and watching *v1alpha1.Bundle from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.812150       1 reflector.go:219] Starting reflector *v1.PartialObjectMetadata (9h9m53.876857508s) from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.812161       1 reflector.go:255] Listing and watching *v1.PartialObjectMetadata from pkg/mod/k8s.io/client-go@v0.23.6/tools/cache/reflector.go:167
I0111 18:54:00.814308       1 event.go:265] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:
...

As @wallrj mentioned there is an alternative way how to specifiy that certain resources should not be cached in new client func https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/cluster/cluster.go#L257

However, I prefer your solution- I agree that otherwise it'd be easy to forget to update the list of objects that should not be cached otherwise if we start interacting with a new resource type.

/lgtm

@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2023
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inteon, irbekrm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2023
@inteon
Copy link
Member Author

inteon commented Jan 13, 2023

/retest

@jetstack-bot jetstack-bot merged commit b0a0001 into cert-manager:main Jan 13, 2023
@vincepri
Copy link

👋 Controller Runtime maintainer here!

Apologies for the confusion here with our UX, need to think more about how to improve it long term. A few things that I can clarify:

  • Controllers using Manager's Clients are by default using a cached clients for read operations.
  • Any Get() call with a typed (e.g. appsv1.Pod{}), or unstructured, or partial metadata starts an internal cache that watches those objects. If there are 3 calls with the same GroupVersionKind, but with a typed/unstructured/metadata object, 3 total caches are started, causing duplication.
  • To avoid caching specific resources, I'd suggest using either a live client (APIReader), like this PR proposed, or even better you can configure the manager to always avoid caching certain types; the check is done per GroupVersionKind, and applies to every object.

The above needs to be improved, and frankly all of the options we have should eventually go in a builder of sorts for both controllers and the manager itself, instead of being in Options.

@irbekrm
Copy link
Contributor

irbekrm commented Jan 24, 2023

Thanks for that great description @vincepri !

or even better you can configure the manager to always avoid caching certain types;

What would be some reasons why this would be better than standalone live client? Maybe because it uses the same rate limiting for all requests? Otherwise it feels like it would be unlikely that a person down the line adding a GET request for a new type would actually remember to add it to uncached types unless they've memorized that this needs to be done.

@vincepri
Copy link

What's better probably depends on what you're looking the default behavior to be like, if one prefers the default to be always uncached, an APIReader is probably a better choice. On the flip side, if you only want to avoid caching certain objects for reasons like size, freshness, etc, the cached client is preferred. Usually caches come in handy when lots of Get/List requests are performed against the same objects, it avoid overloading the API Server with many roundtrips.

@irbekrm
Copy link
Contributor

irbekrm commented Jan 24, 2023

Thanks that makes sense! I was worried that perhaps there's something that we're missing out by using APIReader to GET objects that we know we don't need to cache (as opposed to using client obtained from the manager). We do use cache for the objects that need to be retrieved in full frequently 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache correctness investigations
5 participants