Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: fix a memory leak resulting from too many k8s client instantiations #6026

Merged

Conversation

misberner
Copy link
Contributor

See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes.

I assume that eventually, the generic helm-operator code in this repo should depend on github.com/operator-framework/helm-operator-plugins. However, because the latter repo is still marked as experimental, and importing it would result in a large number of accidental dependency changes, I have decided to merely copy code over.

I've tried to minimize changes to both the copied code as well as the existing code in this repo. Because the bulk of the code is copied, I don't think replicating tests would be useful, but I can see if I can copy them over if it turns out to be not too much effort.

Description of the change: Fix a memory leak in the Helm operator code.

Motivation for the change: Excessive memory usage of Helm operators (2GB+)

Checklist

If the pull request includes user-facing changes, extra documentation is required:

…ions.

See operator-framework/helm-operator-plugins#198
for a detailed description of the issue. This commit ports over the
relevant changes.

Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
@misberner misberner temporarily deployed to deploy September 9, 2022 15:11 Inactive
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@misberner Thanks for the contribution! Apologies for taking so long to get around to reviewing this, but the changes look good to me.

I'd like to get another set of eyes on this prior to merging. @varshaprasad96 would you mind taking a look at this?

@jmccormick2001
Copy link
Contributor

FYI, I think this one is hitting us on one of our customer deployments, infinidat CSI driver which is helm based.

@jberkhahn
Copy link
Contributor

This looks reasonable, but I would prefer to have the tests ported over. They'll get run in our CI, and they'll be deleted along with this stuff when we move over to actually using helm-operator-plugins.

@tuxtof
Copy link

tuxtof commented Dec 6, 2022

ok make some tests with the patch on a 70 nodes clusters trying to deploy 8600 pods to check behaviour

green = pod numbers ( with a scale factor to see it ont he graph)
yellow = operator memory usage

without the patch my operator consume almost 1.2GB of memory
image

with the patch there is clear improvement for sure less than 800MB
image

but in both case consumption increase is fully linear with number of pods, which I am rather uncomfortable with

if i disable the watchDependentResources i come back to a pretty more light and stable value less than 200MB
image

and not sure to understood the impact on the remediation loop , doc is unclear on that

@mpryc
Copy link

mpryc commented Jan 17, 2023

Hello,

Is there any ETA on this review to be merged? I am hitting the issue where our operator is consuming 3G+ of memory, which I am guessing this PR will resolve.

@jberkhahn
Copy link
Contributor

dunno where OP is, i'll merge this and add the tests myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants