-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ignoreResourceUpdates
to reduce controller CPU usage (#13534)
#13912
feat: add ignoreResourceUpdates
to reduce controller CPU usage (#13534)
#13912
Conversation
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #13912 +/- ##
==========================================
- Coverage 49.61% 49.61% -0.01%
==========================================
Files 256 257 +1
Lines 43829 44146 +317
==========================================
+ Hits 21744 21901 +157
- Misses 19948 20091 +143
- Partials 2137 2154 +17
☔ View full report in Codecov by Sentry. |
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
ignoreResourceUpdates
to reduce controller CPU usage (#13534)
This PR looks fantastic. It looks like could fix the app controller CPU high issues. |
Hi @agaudreault-jive |
@jaideepr97 If your question is related to why there are 2 different settings and ignoreDifferences could not be reused to skip the reconcile as well: In our case, ignore difference has more configuration than what is necessary for the reconcile optimization. It is also hard/impossible to know what everyone has configured. Having 2 configurations prevents the possibility of conflicts. However, |
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
…/argo-cd into reduce-object-reconcile
Agreed, this pr amazing. We are seeing 100% cpu endlessly, with the application-controller monitoring out 2k pods. This seems to be due metric changes on the HPA's. |
@donkeyx are you in a position to run this branch internally and monitor the effects? I'd be happy to help cherry pick these changes to whatever version you're running now. |
@crenshaw-dev we are currently running |
@donkeyx for HPA, we had the same issue and I used the following configs to make sure it works with v1 too. You might still see a couple of reconciles due to the ReplicaSet/Pods/Deployments updates, but no more from the HPA.
|
Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
One "glitch" that I am seeing is when a ReplicaSet is scaled down (by HPA). When a pod is set to terminating, its health turns to "progressing", the Application health also changes to "progressing". However, when the pods "disappear" from the UI, the Application status is not updated and stays "Progressing". I expect the The app status is the following, but no resources in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one remaining substantial thought.
…cile Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@agaudreault-jive thanks to you and your company for this significant contribution to improving performance |
I Added to my config an additional section:
Then I restarted
image: quay.io/argoproj/argocd:v2.8.0-rc1 |
Actually, after setting
On the other hand, when |
@everythings-gonna-be-alright #14304 until this is merged and cherry-picked, you can use "debug" logs while |
…oproj#13534) (argoproj#13912) * feat: ignore watched resource update Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * add doc and CLI Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * update doc index Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * add command Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * codegen Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * revert formatting Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * do not skip on health change Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * update doc Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * update logging to use context Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * fix typos. local build broken... Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * change after review Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * manifestHash to string Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * more doc Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * example for argoproj Application Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * add unit test for ignored logs Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * codegen Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * Update docs/operator-manual/reconcile.md Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * move hash and set log to debug Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * feature flag Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * less aggressive managedFields ignore rule Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * Update docs/operator-manual/reconcile.md Co-authored-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * use local settings Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * latest settings Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * safety first Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * since it's behind a feature flag, go aggressive on overrides Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…oproj#13534) (argoproj#13912) * feat: ignore watched resource update Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * add doc and CLI Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * update doc index Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * add command Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * codegen Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * revert formatting Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * do not skip on health change Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * update doc Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * update logging to use context Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * fix typos. local build broken... Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * change after review Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * manifestHash to string Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * more doc Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * example for argoproj Application Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * add unit test for ignored logs Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * codegen Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * Update docs/operator-manual/reconcile.md Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * move hash and set log to debug Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * feature flag Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * less aggressive managedFields ignore rule Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * Update docs/operator-manual/reconcile.md Co-authored-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> * use local settings Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * latest settings Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * safety first Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * since it's behind a feature flag, go aggressive on overrides Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Alexandre Gaudreault <alexandre.gaudreault@logmein.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Just to be sure, argocd is taking |
@ebuildy correct. You have a few example in https://argo-cd.readthedocs.io/en/stable/operator-manual/reconcile |
Closes #13534 #6108 #13614 #8471 #8100 #7406 #9014 #9819
Changes:
ignoreResourceUpdates
global configuration to ignore fields before to hash resources.ignoreDifferencesOnResourceUpdates
config to use ignoreDifferences automatically toignoreResourceUpdates
.Refreshing app %s for change in cluster of object %s of type %s/%s
debug log to info to help get statistics and configureignoreResourceUpdates
.msg="Refreshing app*for change*" | rex field=msg "Refreshing app (?<application>\S+) for change in cluster of object (?<resource>\S+) of type (?<type>\S+)" | stats count by application resource type | sort -count
can be used.This was a result of adding the following config
During business hours, after optimization.
Checklist:
Please see Contribution FAQs if you have questions about your pull-request.