-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage on application-controller -> Infinite level 1 refresh loop? #13614
Comments
We encountered the same behavior as well. The appplication-controller keeps reconcile the level 1 app from the Could we tune the application controller to not monitor the managed app so frequently? |
We currently have this issue increasingly - the more apps we create the higher the uptime on refreshes ergo the CPU load becomes higher as well. I want to say we saw this happening when we introduced KEDA to our deployment strategy. Our deployment has 4 ScaledObjects with a polling interval of 5 which manages 4 HPA. We have tried to exclude HPA and ScaledObject from Diffing via argocd-cm (I've omitted company data)
I can't share any logs since customer names appear often, but I can tell you we have about twice as many We have about 450 lines of logs for 10 milliseconds all being either Currently, we're on 2700 CPU with 30 apps. |
The behavior can be configured in |
Checklist:
argocd version
.Describe the bug
I have ArgoCD deployed via the community chart. This is a single-cluster deployment with 99 applications. These apps are pretty typical Deployment / Ingress / CM / VPA / Service / SA, nothing wild.
When I do a kubectl top, I can see that my argocd-application-controller pod is using ~5 vCPU. I belive that it would be more, but k8s is throttling application-controller CPU
I actually have two clusters, and both of these have the same issue.
When I look at the logs I see a massive number of level 1 syncs' occuring, tens' per second, which seems wrong
I have redacted my application names, but its a series of different apps, and the logs here continue ad infinitum (it doesn't stop, I haven't caught it mid-sync)
To Reproduce
I am using https://artifacthub.io/packages/helm/argo/argo-cd at version: 5.31.1
I'm not currently set many variables differently from the defaults other than setting up HA for the redis cluster, and SSO for the ArgoCD UI.
ArgoCD is pointing at a giant monorepo that gets updated every couple of mins with application manifests generated by Jenkins.
Expected behavior
I was expecting ~100-200mCPU usage for the argocd-application-controller replicaset pod.
Version
here is my argocd version:
v2.7.1+5e54351.dirty
** Help needed **
I have already checked several related issues:
Issue 6108 is around L0 syncs, which I have checked & are not an issue for my case.
However, this is what lead me to look at the L1 syncs, which I believe to be the root cause. I don't understand fully what triggers L1 Syncs and need a signpost in the right direction.
Per comments in the code, a L1 sync is:
Compare live application state against state defined using revision of most recent comparison.
I got this far in the code, but don't have a hollistic understanding of whats going on here
argo-cd/controller/appcontroller.go
Lines 236 to 273 in d9bc6cf
The text was updated successfully, but these errors were encountered: