Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak due to misuse of K8s clients #198

Merged
merged 1 commit into from
Sep 16, 2022

Conversation

misberner
Copy link
Contributor

When investigating high memory usage in the RHACS operator, profiling showed that most of the memory consumed by the discovery client's processing of the OpenAPI schema:
Screen Shot 2022-09-08 at 02 18 07
The profile seemed odd because an ever-growing amount of memory is allocated in a sync.Once, suggesting repeated execution.

After a debugging run, I found that the ActionClientGetter essentially creates:

  • a restClientGetter, with a per-instance lazily created, cached discovery client (seems minor)
  • one k8s.io/kubectl/pkg/cmd/util.Factory instance via the kube.New(rcg) call
  • another Factory via the cmdutil.NewFactory(rcg) call (only used for the secrets clients)
  • a kubernetes.ClientSet (seems minor)
    for each reconciliation of each CR managed by the operator. The major impact is that each factory has its own discovery client, which is initialized again and again, because it is guarded by different instances of the sync.Once.

When I provoked a reconciliation failure, and thereby having the operator perform one reconciliation after the other, I could get the process memory usage to over 2GB in less than a minute.

I haven't fully figured out why garbage collection was of so little use, but I suspect it has something to do with goroutines holding references to some of the objects.

Because all that the various clients need to differ on per CR instance is the namespace, the following PR changes the code to use a single rest client getter, factory, and clientset for everything, injecting the namespace in a lightweight manner when obtaining a per-CR instance action client. With this change, the memory usage hovered around 80MB, even in the above-described reconciliation failure situation.

I believe that what I found and fixed is also the culprit behind several reports of high memory usage:

@misberner
Copy link
Contributor Author

@joelanford sorry, I forgot to explicitly assign a reviewer at PR creation time.

@coveralls
Copy link

coveralls commented Sep 8, 2022

Pull Request Test Coverage Report for Build 3049010052

  • 39 of 44 (88.64%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.1%) to 89.223%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/client/actionconfig.go 20 21 95.24%
pkg/reconciler/reconciler.go 5 9 55.56%
Totals Coverage Status
Change from base Build 3045539330: -0.1%
Covered Lines: 1631
Relevant Lines: 1828

💛 - Coveralls

misberner pushed a commit to misberner/operator-sdk that referenced this pull request Sep 8, 2022
…ions.

See operator-framework/helm-operator-plugins#198
for a detailed description of the issue. This commit ports over the
relevant changes.

Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
misberner added a commit to misberner/operator-sdk that referenced this pull request Sep 8, 2022
…ions.

See operator-framework/helm-operator-plugins#198
for a detailed description of the issue. This commit ports over the
relevant changes.

Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
@joelanford
Copy link
Member

Beautiful! Thanks for finding and fixing this!

/approve

@misberner
Copy link
Contributor Author

@joelanford thanks for approving, it looks like I cannot merge this. Also, is the go-apidiff failure of any concern?

If you get a chance, could you please take a look at the equivalent fix in operator-sdk? I believe this is required to help existing users of the (non-hybrid) Helm operator.

@joelanford
Copy link
Member

I don't think the go-apidiff change is a concern. This repo is not yet 1.0, and the change seems minimal enough that the few number of projects depending on this library should be able to cope.

I'll go ahead and merge. Thanks again!

@joelanford joelanford merged commit 001e6c7 into operator-framework:main Sep 16, 2022
jberkhahn pushed a commit to operator-framework/operator-sdk that referenced this pull request Jan 17, 2023
…ions. (#6026)

See operator-framework/helm-operator-plugins#198
for a detailed description of the issue. This commit ports over the
relevant changes.

Signed-off-by: Malte Isberner <malte.isberner@gmail.com>

Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants