-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase resource limits #367
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this with the customer configuration (Unity w/ health monitor on OCP) as well as a heavy install of any driver (e.g. PFlex with health monitor and sdc monitor enabled along with multiple modules)?
deploy/operator.yaml
Outdated
@@ -928,8 +928,8 @@ spec: | |||
periodSeconds: 10 | |||
resources: | |||
limits: | |||
cpu: 200m | |||
memory: 256Mi | |||
cpu: 400m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any specific data on why we are incrementing to these values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharathsreekanth AFAIK we don't have any specific data, but when Ninjas found the limits to be too low last time, we doubled them, and now a customer ran into the same issue so we are doubling again. It's not unreasonable that we would be running out of memory now, since a lot of additional features have been added to operator since the v0.1.0 release. If we double them now (to 500 mb) we probably won't have to touch them again for a long time. I don't think we have much specific data on this, but I've been under the impression that in other areas of CS it is not unusual to double memory when you run out (e.g. with dynamic arrays). I tried to find resources on memory for containers but wasn't able to find anything.
I do think we should test to see if the CPU limits benefit by being increased or not -- e.g., test 10 installs of a driver with some sidecars with the old cpu limit, then increase it, reinstall the operator, and do the ten installs again and compare the approximate time the csm object takes to go into the ready state with 200m v 400m. If there isn't a significant difference, then I don't think it necessarily makes sense to increase the cpu limit. I documented this suggestion in the Jira defect for this PR as well.
57cbae2
to
cf9ee59
Compare
I tried to replicate it by doing heavy install, but it was installing with no issues.
|
cf9ee59
to
82d6f83
Compare
82d6f83
to
0710659
Compare
PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Increase resource limits to fix
OOMKilled
error.GitHub Issues
List the GitHub issues impacted by this PR:
Checklist:
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration