Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application controller should export workqueue depth metrics #7682

Closed
Ailsa-Wu opened this issue Nov 11, 2021 · 8 comments · Fixed by #8318
Closed

Application controller should export workqueue depth metrics #7682

Ailsa-Wu opened this issue Nov 11, 2021 · 8 comments · Fixed by #8318
Labels
enhancement New feature or request works-for-me Works as intended, or unable to reproduce
Milestone

Comments

@Ailsa-Wu
Copy link

Ailsa-Wu commented Nov 11, 2021

Summary

Application controller should export workqueue depth metrics

Motivation

Queuing delay and queue length are important information to debug control plane performance in large scale applications(3k+).

We use one ArgoCD instance to manage about 3K applications. And the number is still increasing, followed by the performance downgrade, for example it would take more time to trigger the auto sync or it can't refresh the resources status and keep with live status.

We have tried multi method to enhance performance.

  1. scale up the replicas of ArgoCD server and repo-server with the HA guide
  2. increase the --status-processors to 200 and --operation-processors to 100 refer to the doc
  3. configure server replicas with the HPA.

But unfortunately we still need about 20s to list application and update project role, e.g.

image
image

And after we switch to HA mode, the reconcile performance looks worse.
image

Not sure if I am understanding this right, any response would be appreciated🙏.

Proposal

Application controller export the workqueue depth metrics.

@Ailsa-Wu Ailsa-Wu added the enhancement New feature or request label Nov 11, 2021
@alexmt
Copy link
Collaborator

alexmt commented Nov 11, 2021

Hello @Ailsa-Wu ,

The controller workqueue depth metric is already available. You might use the following query to get it: sum(workqueue_depth{name=~"app_.*"}) .

However, the API response slowness is not related to the controller. It is possible that you are affected by this issue #4296 . The issue has a fix already and will be available in v2.2-rc1 that will be published later today. I would appreciate it if you give it a try and let us know if performance improved in your environment.

@jessesuen jessesuen added the works-for-me Works as intended, or unable to reproduce label Nov 11, 2021
@Ailsa-Wu
Copy link
Author

Ailsa-Wu commented Nov 15, 2021

Hi @alexmt , thanks for your response. Actually, I have tried to get the metric by accessing the endpoint http://localhost:8082/metrics . But I could not find this metric workqueue_depth. If there were some issues?🧐

kubectl port-forward svc/argocd-application-controller-metrics -n argocd 8082:8082

image

Our ArgoCD version is v2.0.5, and we still need about 20s to access /api/v1/applications, even if we use local user admin.

@sidewinder12s
Copy link

@jessesuen @alexmt I also don't think I see workqueue_depth on version 2.1.5 anymore.

@yeya24
Copy link
Contributor

yeya24 commented Jan 29, 2022

@alexmt @jessesuen I don't think this works as well. We only registers workqueue metrics to the registry inside component-base library and we don't register it when serving /metrics endpoint. We need to register workqueue metrics to controller's own registry

@yeya24
Copy link
Contributor

yeya24 commented Jan 30, 2022

I just created #8318 to fix this.

@taliastocks
Copy link
Contributor

This appears to be an issue again as of (at least) the 2.9 release.

@jujubetsz
Copy link

Can confirm that after the upgrade to v2.9 the workqueue_depth is no longer available.

@joshuabezaleel
Copy link

+1. the workqueue_depth also disappeard on our side.
Screen Shot 2023-12-25 at 00 19 54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request works-for-me Works as intended, or unable to reproduce
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants