Sharding algorithm constantly evaluated waste CPU and create too many logs #14337

agaudreault · 2023-07-04T21:04:50Z

Checklist:

I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
I've included steps to reproduce the bug.
I've pasted the output of argocd version.

Describe the bug

On version 2.8.0, ArgoCD starts logging a lot. The principal source of logs is in the new sharding algorithm. This algorithm is evaluated on every refresh to an application. The sharding results could easily be cached and re-evaluated when a cluster changes, is added or is removed.

To Reproduce

Deploy 2.8.0-rc1

Caused by #13018

Expected behavior

Info Logs on which shard should process which cluster should only be logged:
- When the controller is started
- When a cluster is added/removed/updated.
There should not be constant debug logs
Shard should not be reevaluated on every application reconcile. The value can likely be cached until a cluster is added/removed/updated.
Log should contain the server name/url, not the internal argoCD ID value.

Version

v2.8.0-rc1

Logs

{"level":"debug","msg":"Calculating cluster shard for cluster id: ","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: ","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: 14a0dfbf-8fa6-4bf4-ae35-b1b7e2ebe948","time":"2023-07-04T20:44:17Z"}
{"level":"info","msg":"Cluster with id=14a0dfbf-8fa6-4bf4-ae35-b1b7e2ebe948 will be processed by shard 0","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: 6d41f1db-fb4f-4883-b28f-4074159d1c6a","time":"2023-07-04T20:44:17Z"}
{"level":"info","msg":"Cluster with id=6d41f1db-fb4f-4883-b28f-4074159d1c6a will be processed by shard 1","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: ","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: 14a0dfbf-8fa6-4bf4-ae35-b1b7e2ebe948","time":"2023-07-04T20:44:17Z"}
{"level":"info","msg":"Cluster with id=14a0dfbf-8fa6-4bf4-ae35-b1b7e2ebe948 will be processed by shard 0","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: 57d330be-ae91-4a23-9303-ab8dbcc306da","time":"2023-07-04T20:44:17Z"}
{"level":"info","msg":"Cluster with id=57d330be-ae91-4a23-9303-ab8dbcc306da will be processed by shard 1","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: 57d330be-ae91-4a23-9303-ab8dbcc306da","time":"2023-07-04T20:44:17Z"}
{"level":"info","msg":"Cluster with id=57d330be-ae91-4a23-9303-ab8dbcc306da will be processed by shard 1","time":"2023-07-04T20:44:17Z"}
{"level":"debug","msg":"Calculating cluster shard for cluster id: 57d330be-ae91-4a23-9303-ab8dbcc306da","time":"2023-07-04T20:44:17Z"}
{"level":"info","msg":"Cluster with id=57d330be-ae91-4a23-9303-ab8dbcc306da will be processed by shard 1","time":"2023-07-04T20:44:17Z"}

The text was updated successfully, but these errors were encountered:

crenshaw-dev · 2023-07-06T14:15:06Z

I'd happily quickly review a fix for this, if you have time to write one. :-)

agaudreault · 2023-07-06T20:59:25Z

I think the PR above can be cherry-picked in 2.8 to fix the logging issue.

I'll try to code something to cache the sharding results in another PR.

@akram let me know if you were developing something around the sharding that would heavily conflict with a cached implementation of the cluster shards.

Enclavet · 2023-08-02T19:25:11Z

FYI round-robin vs legacy sharding algorithm CPU utilization is significant. This is a test with ArgoCD managing 99 clusters with 5000 applications. First part is round-robin and second part is switching to legacy. Using 2.8.0-rc5

akram · 2023-08-03T12:36:45Z

Hi @agaudreault-jive , thanks for showing these findings. As I was working on something else, it only popped in my radar today.
I will have a look at your PR and test it as well.
Regarding possible impacts of cached sharding results, I know that a colleague is working on different implementation and will check that as well.

agaudreault · 2023-08-03T17:09:12Z

@akram Awesome, if you have questions, ping me on the cncf slack and I'll answer a bit faster, my handle is @agaudreault!

I hope my draft isn't too far-off!

agaudreault · 2023-08-12T00:08:04Z

I started working on my draft PR, I am currently testing with multiple clusters all with the same server url and hitting the issue #15027. I will check if it is possible to change my implementation to use a key name/server for the cache.

agaudreault added the bug Something isn't working label Jul 4, 2023

crenshaw-dev added this to Argo CD Roadmap Jul 4, 2023

github-project-automation bot moved this to Backlog in Argo CD Roadmap Jul 4, 2023

agaudreault mentioned this issue Jul 6, 2023

fix(sharding): recurring info logs to debug #14383

Merged

agaudreault mentioned this issue Jul 7, 2023

feat(sharding): keep cluster sharding in memory (#14337) #14388

Closed

13 tasks

agaudreault self-assigned this Aug 11, 2023

akram mentioned this issue Aug 25, 2023

fix: Use the cache for sharding #15237

Merged

13 tasks

ishitasequeira closed this as completed in #15237 Jan 11, 2024

agaudreault moved this from Backlog to Completed in Argo CD Roadmap Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding algorithm constantly evaluated waste CPU and create too many logs #14337

Sharding algorithm constantly evaluated waste CPU and create too many logs #14337

agaudreault commented Jul 4, 2023 •

edited

Loading

crenshaw-dev commented Jul 6, 2023

agaudreault commented Jul 6, 2023

Enclavet commented Aug 2, 2023 •

edited

Loading

akram commented Aug 3, 2023

agaudreault commented Aug 3, 2023

agaudreault commented Aug 12, 2023

Sharding algorithm constantly evaluated waste CPU and create too many logs #14337

Sharding algorithm constantly evaluated waste CPU and create too many logs #14337

Comments

agaudreault commented Jul 4, 2023 • edited Loading

crenshaw-dev commented Jul 6, 2023

agaudreault commented Jul 6, 2023

Enclavet commented Aug 2, 2023 • edited Loading

akram commented Aug 3, 2023

agaudreault commented Aug 3, 2023

agaudreault commented Aug 12, 2023

agaudreault commented Jul 4, 2023 •

edited

Loading

Enclavet commented Aug 2, 2023 •

edited

Loading