helm: fix a memory leak resulting from too many k8s client instantiations #6026

misberner · 2022-09-08T21:12:06Z

See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes.

I assume that eventually, the generic helm-operator code in this repo should depend on github.com/operator-framework/helm-operator-plugins. However, because the latter repo is still marked as experimental, and importing it would result in a large number of accidental dependency changes, I have decided to merely copy code over.

I've tried to minimize changes to both the copied code as well as the existing code in this repo. Because the bulk of the code is copied, I don't think replicating tests would be useful, but I can see if I can copy them over if it turns out to be not too much effort.

Description of the change: Fix a memory leak in the Helm operator code.

Motivation for the change: Excessive memory usage of Helm operators (2GB+)

Checklist

If the pull request includes user-facing changes, extra documentation is required:

~~Add a new changelog fragment in changelog/fragments (see changelog/fragments/00-template.yaml)~~
~~Add or update relevant sections of the docs website in website/content/en/docs~~

…ions. See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com>

everettraven

@misberner Thanks for the contribution! Apologies for taking so long to get around to reviewing this, but the changes look good to me.

I'd like to get another set of eyes on this prior to merging. @varshaprasad96 would you mind taking a look at this?

jmccormick2001 · 2022-11-09T20:50:15Z

FYI, I think this one is hitting us on one of our customer deployments, infinidat CSI driver which is helm based.

jberkhahn · 2022-11-15T23:09:13Z

This looks reasonable, but I would prefer to have the tests ported over. They'll get run in our CI, and they'll be deleted along with this stuff when we move over to actually using helm-operator-plugins.

tuxtof · 2022-12-06T09:21:50Z

ok make some tests with the patch on a 70 nodes clusters trying to deploy 8600 pods to check behaviour

green = pod numbers ( with a scale factor to see it ont he graph)
yellow = operator memory usage

without the patch my operator consume almost 1.2GB of memory

with the patch there is clear improvement for sure less than 800MB

but in both case consumption increase is fully linear with number of pods, which I am rather uncomfortable with

if i disable the watchDependentResources i come back to a pretty more light and stable value less than 200MB

and not sure to understood the impact on the remediation loop , doc is unclear on that

mpryc · 2023-01-17T14:26:34Z

Hello,

Is there any ETA on this review to be merged? I am hitting the issue where our operator is consuming 3G+ of memory, which I am guessing this PR will resolve.

jberkhahn · 2023-01-17T23:04:59Z

dunno where OP is, i'll merge this and add the tests myself

openshift-ci bot requested review from camilamacedo86 and fabianvf September 8, 2022 21:12

helm: fix a memory leak resulting from too many k8s client instantiat…

0f7d40d

…ions. See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com>

misberner force-pushed the mi/fix-mem-leak-rebased branch from a9eca3f to 0f7d40d Compare September 8, 2022 21:13

misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive

misberner mentioned this pull request Sep 14, 2022

Fix memory leak due to misuse of K8s clients operator-framework/helm-operator-plugins#198

Merged

everettraven approved these changes Nov 9, 2022

View reviewed changes

KevinMGranger mentioned this pull request Dec 5, 2022

pelorus-operator-controller-manager-* pod is going into CrashLoopBackOff / oomkilled dora-metrics/pelorus#731

Closed

weshayutin mentioned this pull request Jan 17, 2023

OOM kill of the pelorus operator controller manager dora-metrics/pelorus#777

Closed

jberkhahn merged commit 398ae9d into operator-framework:master Jan 17, 2023

jberkhahn mentioned this pull request Jan 23, 2023

Disable caching for Helm Operator when watching all namespaces to avoid OOMKilled in big clusters #6255

Closed

hnajib-sym mentioned this pull request May 9, 2023

Resources are sometimes manipulated with the wrong API group #6220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm: fix a memory leak resulting from too many k8s client instantiations #6026

helm: fix a memory leak resulting from too many k8s client instantiations #6026

misberner commented Sep 8, 2022

everettraven left a comment

jmccormick2001 commented Nov 9, 2022

jberkhahn commented Nov 15, 2022

tuxtof commented Dec 6, 2022 •

edited

Loading

mpryc commented Jan 17, 2023

jberkhahn commented Jan 17, 2023

helm: fix a memory leak resulting from too many k8s client instantiations #6026

helm: fix a memory leak resulting from too many k8s client instantiations #6026

Conversation

misberner commented Sep 8, 2022

everettraven left a comment

Choose a reason for hiding this comment

jmccormick2001 commented Nov 9, 2022

jberkhahn commented Nov 15, 2022

tuxtof commented Dec 6, 2022 • edited Loading

mpryc commented Jan 17, 2023

jberkhahn commented Jan 17, 2023

tuxtof commented Dec 6, 2022 •

edited

Loading