-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak due to misuse of K8s clients #198
Fix memory leak due to misuse of K8s clients #198
Conversation
@joelanford sorry, I forgot to explicitly assign a reviewer at PR creation time. |
Pull Request Test Coverage Report for Build 3049010052
💛 - Coveralls |
…ions. See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
…ions. See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
Beautiful! Thanks for finding and fixing this! /approve |
a4ff5a6
to
6e4d7e6
Compare
@joelanford thanks for approving, it looks like I cannot merge this. Also, is the go-apidiff failure of any concern? If you get a chance, could you please take a look at the equivalent fix in |
I don't think the go-apidiff change is a concern. This repo is not yet 1.0, and the change seems minimal enough that the few number of projects depending on this library should be able to cope. I'll go ahead and merge. Thanks again! |
…ions. (#6026) See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com> Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
When investigating high memory usage in the RHACS operator, profiling showed that most of the memory consumed by the discovery client's processing of the OpenAPI schema:
The profile seemed odd because an ever-growing amount of memory is allocated in a
sync.Once
, suggesting repeated execution.After a debugging run, I found that the
ActionClientGetter
essentially creates:restClientGetter
, with a per-instance lazily created, cached discovery client (seems minor)k8s.io/kubectl/pkg/cmd/util.Factory
instance via thekube.New(rcg)
callFactory
via thecmdutil.NewFactory(rcg)
call (only used for the secrets clients)kubernetes.ClientSet
(seems minor)for each reconciliation of each CR managed by the operator. The major impact is that each factory has its own discovery client, which is initialized again and again, because it is guarded by different instances of the
sync.Once
.When I provoked a reconciliation failure, and thereby having the operator perform one reconciliation after the other, I could get the process memory usage to over 2GB in less than a minute.
I haven't fully figured out why garbage collection was of so little use, but I suspect it has something to do with goroutines holding references to some of the objects.
Because all that the various clients need to differ on per CR instance is the namespace, the following PR changes the code to use a single rest client getter, factory, and clientset for everything, injecting the namespace in a lightweight manner when obtaining a per-CR instance action client. With this change, the memory usage hovered around 80MB, even in the above-described reconciliation failure situation.
I believe that what I found and fixed is also the culprit behind several reports of high memory usage: