Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add UnsafeDisableDeepCopy into cache option to avoid deep copy during get/list #1274

Conversation

FillZpp
Copy link
Contributor

@FillZpp FillZpp commented Nov 25, 2020

Signed-off-by: Siyu Wang FillZpp.pub@gmail.com

Add UnsafeDisableCacheDeepCopy into cache option to avoid deep copy during list.

fixes #1235

@vincepri @alvaroaleman I fully understand why the CacheReader calls DeepCopyObject for each object by default, the directly modification to underlying cache object is terrible. However, we can also provide an option for those people who really need to avoid deep copy.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 25, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @FillZpp. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 25, 2020
@alvaroaleman
Copy link
Member

@FillZpp what is the usecase here that requires listing large numbers of objects and can not be covered via an Index as described e.G. here?

@FillZpp
Copy link
Contributor Author

FillZpp commented Nov 26, 2020

@FillZpp what is the usecase here that requires listing large numbers of objects and can not be covered via an Index as described e.G. here?

In Alibaba, we have lots of K8s clusters and some of them have even more than 200k Pods in a single cluster.
In such cluster, operators like OpenKruise call ListWatch for resources including Pods. So there may be multiple controllers and reconcile workers in one operator process that list their Pods simultaneously, which brings huge pressure for cpu/memory and sometimes leads to OOM.

The default DeepCopy to avoid cache pollution is good for those people who are unfamiliar with controller/informer, but also prejudicial to performance in large-scale clusters. We should provide an option for users that really need to disable this behavior. @alvaroaleman

@alvaroaleman
Copy link
Member

In Alibaba, we have lots of K8s clusters and some of them have even more than 200k Pods in a single cluster.
In such cluster, operators like OpenKruise call ListWatch for resources including Pods.

This change wont reduce the memory usage of the ListWatch

So there may be multiple controllers and reconcile workers in one operator process that list their Pods simultaneously, which brings huge pressure for cpu/memory and sometimes leads to OOM.

I would expect those lists to filter by either using a LabelSelector or an Index

The default DeepCopy to avoid cache pollution is good for those people who are unfamiliar with controller/informer, but also prejudicial to performance in large-scale clusters. We should provide an option for users that really need to disable this behavior.

I disagree, prejudical to performance for large-scale clusters is it to always filter lists via labels or indexers. I wouldn't trust myself to always correctly do the deepcopy. so IMHO this is a huge footgun, even for ppl that are very familiar with controller-runtime.

@FillZpp
Copy link
Contributor Author

FillZpp commented Nov 27, 2020

would expect those lists to filter by either using a LabelSelector or an Index

Yes, those lists do have label or field selectors. Let me explain it.

Take OpenKruise for example, currently there are seven different controllers in OpenKruise and each of them have multiple workers. In these workload controllers, I take CloneSet and SidecarSet for example:

  • CloneSet is a workload for stateless application, just like Deployment/ReplicaSet. When reconcile a CloneSet, the worker should list all Pods belongs to this CloneSet using ownerReference field selector. For multiple workers, they may list their Pods simultaneously and usually there are thousands of Pods in a single CloneSet.
  • SidecarSet is a workload that injects sidecar into Pods during creating. Our mesh and ops sidecars injected into all application Pods are all managed by SidecarSet. So each reconcile worker should list all Pods that have sidecar injected by it using label selector, and also multiple workers may list their Pods simultaneously.

These examples are so mush, I can't describe all of them. In one word, label/field selectors are useful, but they can not avoid the pressure caused by DeepCopy.

I disagree, prejudical to performance for large-scale clusters is it to always filter lists via labels or indexers. I wouldn't trust myself to always correctly do the deepcopy. so IMHO this is a huge footgun, even for ppl that are very familiar with controller-runtime.

I disagree. If so, why doesn't the informer or lister in client-go call DeepCopy for each object? The kube-controller-manager/kube-scheduler and other controllers based on client-go all list objects without DeepCopy. If someone adds DeepCopy into client-go lister and these controllers, it will degrade the performance of K8s.

I agree with the default DeepCopy, but users should have the right to choose not to use it.
@alvaroaleman

@alvaroaleman
Copy link
Member

alvaroaleman commented Nov 30, 2020

The kube-controller-manager/kube-scheduler and other controllers based on client-go all list objects without DeepCopy. If someone adds DeepCopy into client-go lister and these controllers, it will degrade the performance of K8s.

Upstream kube is also very careful about using the cache mutation detector in integration/e2e tests.
Do you have any data on how much memory exactly this saves you?

Generally speaking, I still think that this is likely to cause subtile bugs ppl will not detect and that the memory gain is too little to be worth introducing this. I would assume that if you can afford to run a cluster with 200k pods, you can afford to give your CM 2 gigs more ram - And that this will turn out to be a lot cheaper for you than having to recognize, diagnose and fix bugs that come out of accidentally mutating objects in your cache.

If you want to discuss this, could you join the biweekly meeting this Thursday?

@vincepri
Copy link
Member

If we do allow the cache to expose internal objects, we should opt for the option to be something like UnsafeDisableCacheDeepCopy or something similar.

A similar use case like this we encountered in Cluster API has driven us to use metadata only watches, and use a live reader (with something like #1249) when querying for the objects

@FillZpp
Copy link
Contributor Author

FillZpp commented Dec 1, 2020

@alvaroaleman To hide sensitive data, I can only show part of the heep profile, which was dumped from one of our extend controller in the large-scale cluster. And you can find the DeepCopy in CacheReader costs huge heap memory.

image

@FillZpp FillZpp force-pushed the add-disable-deepcopy-as-list-option branch from c71ebe7 to 22e583b Compare December 1, 2020 11:25
@FillZpp FillZpp changed the title ✨ Add DisableDeepCopy as list option to avoid deep copy during list ✨ Add UnsafeDisableCacheDeepCopy as list option to avoid deep copy during list Dec 1, 2020
@FillZpp
Copy link
Contributor Author

FillZpp commented Dec 1, 2020

If we do allow the cache to expose internal objects, we should opt for the option to be something like UnsafeDisableCacheDeepCopy or something similar.

Good idea. I have updated it.

@vincepri
Copy link
Member

vincepri commented Dec 1, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 1, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 1, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 31, 2021
@alvaroaleman
Copy link
Member

/retest

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@FillZpp
Copy link
Contributor Author

FillZpp commented Jul 27, 2021

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jul 27, 2021
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good to me, just a few more comments. Thanks for working on this!

Comment on lines 120 to 123
// UnsafeDisableCacheDeepCopy indicates not to deep copy objects during get or list objects.
// Be very careful with this, when enabled you must DeepCopy any object before mutating it,
// otherwise you will mutate the object in the cache.
UnsafeDisableCacheDeepCopy *DisableCacheDeepCopy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be a bool now?

Also name nit: since this is in the cache package on the cache.Options struct, I don't think we need the word Cache in the name.

Suggested change
// UnsafeDisableCacheDeepCopy indicates not to deep copy objects during get or list objects.
// Be very careful with this, when enabled you must DeepCopy any object before mutating it,
// otherwise you will mutate the object in the cache.
UnsafeDisableCacheDeepCopy *DisableCacheDeepCopy
// UnsafeDisableDeepCopy indicates not to deep copy objects during get or list objects.
// Be very careful with this, when enabled you must DeepCopy any object before mutating it,
// otherwise you will mutate the object in the cache.
UnsafeDisableDeepCopy bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be a bool now?

I changed the type from bool to struct in case we might need to add some optional policies for it in future.

Also name nit: since this is in the cache package on the cache.Options struct, I don't think we need the word Cache in the name.

Have removed Cache in the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if @vincepri @alvaroaleman and @varshaprasad96 have opinions, but I think I'd prefer to make this a bool since we have no options right now to deal with (and may never).

If we do end up needing to configure other aspects of this in the future, we could introduce an UnsafeDisableDeepCopyOptions struct alongside this bool if we wanted to avoid a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have changed it back to bool 🤔

@@ -65,6 +66,7 @@ func newSpecificInformersMap(config *rest.Config,
createListWatcher: createListWatcher,
namespace: namespace,
selectors: selectors,
disableDeepCopy: disableDeepCopy,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvaroaleman I also considered the need for this to be GVK-specific. My comment earlier touches on that:

This option would probably require something like the GVKCaches composite cache proposed in #1591, which allows separate cache implementations per GVK.

It's probably do-able to make this option work in the informersMap on a per-GVK basis, but I feel like there are going to be other use cases around GVK-specific cache semantics (e.g. watch GVKA in namespace foo, GVKB in namespace bar and GVKC cluster-wide) that it may be better to keep this minimal.

That said, I see we already have selectors keyed by GVK, so maybe its best to do the same here and re-evaluate where GVK-specifics should live in a separate feature?

By("verifying the pods have the same pointer addresses")
By("verifying the pointer fields in pod have the same addresses")
Expect(out1).To(Equal(out2))
Expect(reflect.ValueOf(out1.Labels).Pointer()).To(BeIdenticalTo(reflect.ValueOf(out2.Labels).Pointer()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking nit: pretty sure there's no need to actually fetch the pointer for BeIdenticalTo. This could be simplified to:

Suggested change
Expect(reflect.ValueOf(out1.Labels).Pointer()).To(BeIdenticalTo(reflect.ValueOf(out2.Labels).Pointer()))
Expect(out1.Labels).To(BeIdenticalTo(out2.Labels))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried Expect(out1.Labels).To(BeIdenticalTo(out2.Labels)) before, but it failed. I'm not sure why this happened, the out1.Labels and out2.Labels do have the same pointer addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, alright 🤷‍♂️. Thanks for checking that out!

// you must DeepCopy any object before mutating it outside
} else {
// deep copy to avoid mutating cache
// TODO(directxman12): revisit the decision to always deepcopy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this comment now.

Suggested change
// TODO(directxman12): revisit the decision to always deepcopy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if c.disableDeepCopy != nil {
// skip deep copy which might be unsafe
// you must DeepCopy any object before mutating it outside
outObj = obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set the GVK here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set the GVK here might modify the object in Informer. We may just ignore it according to the discussion earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it! That earlier discussion makes more sense to me now :)

By("verifying the pointer fields in pod have the same addresses")
Expect(out1).To(Equal(out2))
Expect(reflect.ValueOf(out1.Labels).Pointer()).To(BeIdenticalTo(reflect.ValueOf(out2.Labels).Pointer()))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add another expectation that verifies the GVK is set on the returned objects after Get() and List() when deep copying is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When deep copying is disabled, we could not ensure the GVK is set on the returned objects.

@joelanford
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 4, 2021
@FillZpp FillZpp force-pushed the add-disable-deepcopy-as-list-option branch 3 times, most recently from e390135 to e80f244 Compare August 5, 2021 04:03
@FillZpp
Copy link
Contributor Author

FillZpp commented Aug 5, 2021

@alvaroaleman @joelanford Thanks for all your comments. I have done all of them. PTAL again 😆

@FillZpp FillZpp changed the title ✨ Add UnsafeDisableCacheDeepCopy into cache option to avoid deep copy during get/list ✨ Add UnsafeDisableDeepCopy into cache option to avoid deep copy during get/list Aug 5, 2021
@joelanford
Copy link
Member

Looks good to me. I think the only outstanding question (in this thread) is whether we should have:

UnsafeDisableDeepCopy bool

or

UnsafeDisableDeepCopy map[schema.GroupVersionKind]bool

which would align this with the selectors SelectorsByGVK field.

@FillZpp
Copy link
Contributor Author

FillZpp commented Aug 6, 2021

which would align this with the selectors SelectorsByGVK field.

So... How about two fields in cache.Options?

type Options {
    // ......

    // UnsafeDisableDeepCopy indicates not to deep copy objects during get or list objects.
    // Be very careful with this, when enabled you must DeepCopy any object before mutating it,
    // otherwise you will mutate the object in the cache.
    UnsafeDisableDeepCopy bool

    // UnsafeDisableDeepCopyByObject  indicates not to deep copy objects during
    // get or list objects per GVK at the specified object.
    // Be very careful with this, when enabled you must DeepCopy any object before mutating it,
    // otherwise you will mutate the object in the cache.
    UnsafeDisableDeepCopyByObject DisableDeepCopyByObject
}

// DisableDeepCopyByObject associate a client.Object's GVK to whether it should disable deep copy.
DisableDeepCopyByObject map[client.Object]bool

Then developers can choose to disable all with some GVK specified to enable, or enable all with some GVK specified to disable.

WDYT @joelanford

@joelanford
Copy link
Member

joelanford commented Aug 6, 2021

It might be simpler to just define a constant:

package cache
const GroupVersionKindAll = schema.GroupVersionKind{}

And if it is present in the map, we would skip deep copy for all GVKs. I think this is what @alvaroaleman had in mind, and there's precendence for this in other areas of controller-runtime (e.g. cache.Options.Namespace) and core kube (e.g. corev1.NamespaceAll)

@FillZpp
Copy link
Contributor Author

FillZpp commented Aug 7, 2021

Defining its type as map[schema.GroupVersionKind]bool requires developers to set the option with GVK, which might be a little complicated, comparing to the SelectorsByObject option.

However if its type is map[client.Object]bool, how can we define the object for all object? Do you have any idea? @joelanford

@FillZpp FillZpp force-pushed the add-disable-deepcopy-as-list-option branch from e80f244 to e17741a Compare August 9, 2021 03:02
…ng list

Signed-off-by: Siyu Wang <FillZpp.pub@gmail.com>
@FillZpp FillZpp force-pushed the add-disable-deepcopy-as-list-option branch from e17741a to 0a6284d Compare August 9, 2021 07:32
@FillZpp
Copy link
Contributor Author

FillZpp commented Aug 9, 2021

@joelanford I have changed the option type to DisableDeepCopyByObject and DisableDeepCopyByGVK internal.

@FillZpp
Copy link
Contributor Author

FillZpp commented Aug 13, 2021

PTAL @alvaroaleman @joelanford when you have time, thanks.

@joelanford
Copy link
Member

/approve

This looks great @FillZpp! Thanks for sticking with this!

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 18, 2021
@FillZpp
Copy link
Contributor Author

FillZpp commented Aug 19, 2021

Thanks for reviewing this @joelanford .Now we need lgtm, PTAL @alvaroaleman

Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, FillZpp, joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,joelanford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CacheReader: DeepCopyObject on every Get/List method calls
7 participants