Increase resource limits #367

chimanjain · 2023-10-11T11:10:07Z

Description

Increase resource limits to fix OOMKilled error.

GitHub Issues

List the GitHub issues impacted by this PR:

GitHub Issue #
dell/csm#982

Checklist:

I have performed a self-review of my own code to ensure there are no formatting, vetting, linting, or security issues
I have verified that new and existing unit tests pass locally with my changes
I have not allowed coverage numbers to degenerate
I have maintained at least 90% code coverage
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have maintained backward compatibility

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

Sanity Test

jooseppi-luna

Have you tested this with the customer configuration (Unity w/ health monitor on OCP) as well as a heavy install of any driver (e.g. PFlex with health monitor and sdc monitor enabled along with multiple modules)?

bharathsreekanth · 2023-10-13T14:00:23Z

deploy/operator.yaml

@@ -928,8 +928,8 @@ spec:
          periodSeconds: 10
        resources:
          limits:
-            cpu: 200m
-            memory: 256Mi
+            cpu: 400m


Is there any specific data on why we are incrementing to these values?

@bharathsreekanth AFAIK we don't have any specific data, but when Ninjas found the limits to be too low last time, we doubled them, and now a customer ran into the same issue so we are doubling again. It's not unreasonable that we would be running out of memory now, since a lot of additional features have been added to operator since the v0.1.0 release. If we double them now (to 500 mb) we probably won't have to touch them again for a long time. I don't think we have much specific data on this, but I've been under the impression that in other areas of CS it is not unusual to double memory when you run out (e.g. with dynamic arrays). I tried to find resources on memory for containers but wasn't able to find anything.

I do think we should test to see if the CPU limits benefit by being increased or not -- e.g., test 10 installs of a driver with some sidecars with the old cpu limit, then increase it, reinstall the operator, and do the ten installs again and compare the approximate time the csm object takes to go into the ready state with 200m v 400m. If there isn't a significant difference, then I don't think it necessarily makes sense to increase the cpu limit. I documented this suggestion in the Jira defect for this PR as well.

chimanjain · 2023-10-17T10:57:44Z

Have you tested this with the customer configuration (Unity w/ health monitor on OCP) as well as a heavy install of any driver (e.g. PFlex with health monitor and sdc monitor enabled along with multiple modules)?

I tried to replicate it by doing heavy install, but it was installing with no issues.
But the limit should be increased as:

Customer is facing OOMKilled failure.
Many modules have been introduced since we last defined the resources.
As we are increasing the limit and not the request of the resources, it won't impact the initial request of the resources, only the edge cases where we are doing heavy install.

chimanjain · 2023-10-18T13:03:41Z

PTAL

HarishH-DELL

LGTM

chimanjain requested review from rajkumar-palani, nitesh3108, shefali-malhotra and HarishH-DELL October 11, 2023 11:10

nitesh3108 previously approved these changes Oct 12, 2023

View reviewed changes

jooseppi-luna reviewed Oct 12, 2023

View reviewed changes

alikdell previously approved these changes Oct 12, 2023

View reviewed changes

bharathsreekanth reviewed Oct 13, 2023

View reviewed changes

chimanjain force-pushed the increase-resource-limits branch from 57cbae2 to cf9ee59 Compare October 16, 2023 11:22

chimanjain requested a review from Prabhu-Dell as a code owner October 16, 2023 11:22

chimanjain force-pushed the increase-resource-limits branch from cf9ee59 to 82d6f83 Compare October 17, 2023 13:02

increase resource limit

0710659

chimanjain force-pushed the increase-resource-limits branch from 82d6f83 to 0710659 Compare October 18, 2023 12:46

update cpu limit

f9701d2

chimanjain dismissed stale reviews from nitesh3108 and alikdell via f9701d2 October 18, 2023 12:57

chimanjain requested review from jooseppi-luna, nitesh3108, bharathsreekanth and alikdell October 18, 2023 13:03

jooseppi-luna approved these changes Oct 18, 2023

View reviewed changes

HarishH-DELL approved these changes Oct 20, 2023

View reviewed changes

chimanjain merged commit e60a6d1 into main Oct 20, 2023

chimanjain deleted the increase-resource-limits branch October 20, 2023 06:07

ChristianAtDell added a commit that referenced this pull request Oct 15, 2024

Increase resource limits (#367)

6a4dbad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase resource limits #367

Increase resource limits #367

chimanjain commented Oct 11, 2023 •

edited

Loading

jooseppi-luna left a comment

bharathsreekanth Oct 13, 2023

jooseppi-luna Oct 17, 2023

chimanjain commented Oct 17, 2023 •

edited

Loading

chimanjain commented Oct 18, 2023

HarishH-DELL left a comment

Increase resource limits #367

Increase resource limits #367

Conversation

chimanjain commented Oct 11, 2023 • edited Loading

Description

GitHub Issues

Checklist:

How Has This Been Tested?

jooseppi-luna left a comment

Choose a reason for hiding this comment

bharathsreekanth Oct 13, 2023

Choose a reason for hiding this comment

jooseppi-luna Oct 17, 2023

Choose a reason for hiding this comment

chimanjain commented Oct 17, 2023 • edited Loading

chimanjain commented Oct 18, 2023

HarishH-DELL left a comment

Choose a reason for hiding this comment

chimanjain commented Oct 11, 2023 •

edited

Loading

chimanjain commented Oct 17, 2023 •

edited

Loading