Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixes sharding placement algorithm and allows development of alternative algorithms #13018

Merged
merged 6 commits into from
Jun 5, 2023

Conversation

akram
Copy link
Contributor

@akram akram commented Mar 27, 2023

  • Defines a new API based on distributionFunction that allows developer to introduce other distribution and placement algorithms
  • Defines a new algorithm named hash that uses db.clusterList index as the sharding placement parameter
  • Maintain the previous algorithm as legacy and keep it as default as it works with integers as strings of clusters uid. In pratice, uid are uuid which makes the default algorihm not work well.
  • Fixes the argocd admin shard status to use the distributionFunction and hence produce the same results as the actual placement.
  • Fixes CodeQL issue related to insecure string to int conversion
  • Adds log messages

Fixes #9633 #11537 #13136 #11537 #13226

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

Please see Contribution FAQs if you have questions about your pull-request.

@akram akram force-pushed the fix-round-robin-cluster-sharding branch from e04a107 to 1da356c Compare March 27, 2023 07:46
@akram akram changed the title Use db.clusterList index as the sharding placement parameter fix: Use db.clusterList index as the sharding placement parameter Mar 27, 2023
@akram akram force-pushed the fix-round-robin-cluster-sharding branch 2 times, most recently from dda4713 to 8e67a27 Compare March 27, 2023 08:52
@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Patch coverage: 83.80% and project coverage change: +0.09 🎉

Comparison is base (8c0456b) 49.48% compared to head (095cb58) 49.57%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13018      +/-   ##
==========================================
+ Coverage   49.48%   49.57%   +0.09%     
==========================================
  Files         256      256              
  Lines       43844    43920      +76     
==========================================
+ Hits        21695    21773      +78     
+ Misses      19988    19985       -3     
- Partials     2161     2162       +1     
Impacted Files Coverage Δ
cmd/argocd/commands/admin/cluster.go 0.00% <0.00%> (ø)
controller/appcontroller.go 53.35% <0.00%> (-0.04%) ⬇️
util/env/env.go 96.07% <50.00%> (-3.93%) ⬇️
controller/sharding/sharding.go 93.06% <96.55%> (+44.58%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@akram akram force-pushed the fix-round-robin-cluster-sharding branch 4 times, most recently from 17b3986 to d4f8ef3 Compare March 29, 2023 10:00
@akram akram marked this pull request as draft April 3, 2023 12:41
@akram akram force-pushed the fix-round-robin-cluster-sharding branch from d4f8ef3 to 17b3986 Compare April 3, 2023 16:10
@akram akram mentioned this pull request Apr 3, 2023
13 tasks
@akram akram force-pushed the fix-round-robin-cluster-sharding branch 2 times, most recently from e7b258b to 24ef5ea Compare April 3, 2023 18:34
@akram akram changed the title fix: Use db.clusterList index as the sharding placement parameter fix: Sort cluster uids and use them as a sharding placement parameter Apr 3, 2023
@akram akram marked this pull request as ready for review April 4, 2023 08:26
@akram akram marked this pull request as draft April 4, 2023 08:28
@akram akram force-pushed the fix-round-robin-cluster-sharding branch 7 times, most recently from eb005bf to a2afda7 Compare April 4, 2023 08:58
@akram akram force-pushed the fix-round-robin-cluster-sharding branch 4 times, most recently from f079c26 to 39429e6 Compare April 4, 2023 12:37
@akram akram force-pushed the fix-round-robin-cluster-sharding branch 3 times, most recently from 7457a40 to d07922a Compare June 1, 2023 14:35
@akram
Copy link
Contributor Author

akram commented Jun 3, 2023

@leoluz gentle ping to re-enable auto-merge for this one. I amended last commit after updating the manifests to align them with the env variable ARGOCD_CONTROLLER_SHARDING_ALGORITHM.

@akram akram force-pushed the fix-round-robin-cluster-sharding branch 2 times, most recently from 77916ea to be58bca Compare June 4, 2023 12:42
akram and others added 6 commits June 5, 2023 06:31
…pe of functions to filter clusters by shard

- Adding unit tests for sharding
- Refresh clusters list on DistributionFunction call

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
…it size from [strconv.Atoi](1) to a lower bit size type uint32 without an upper bound check.

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Signed-off-by: Raghavi Shirur <rshirur@redhat.com>
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
…e sharding algo env var

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
…ontroller.sharding.algorithm

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
…bution methods name, ran codegen on manifests

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
@akram akram force-pushed the fix-round-robin-cluster-sharding branch from be58bca to 095cb58 Compare June 5, 2023 04:31
@leoluz leoluz merged commit ee983fe into argoproj:master Jun 5, 2023
@bygui86
Copy link

bygui86 commented Jun 5, 2023

@leoluz amazing that this feature got merged! In which version is going to be released and when?
thanks to all guys!!

@leoluz
Copy link
Collaborator

leoluz commented Jun 5, 2023

@bygui86 In the Projects section of this PR (right panel) you can see the roadmap it is associated to and the release day planned for that. In case you can't see it, it is Argo CD 2.8 (~Aug 07th 2023)

@bygui86
Copy link

bygui86 commented Jun 5, 2023

@leoluz great thanks now I see it!

@bofeng96
Copy link

Appreciated!

@bofeng96
Copy link

I'm not sure if other users have this requirement. I believe it's better to let one application controller shard handle in-cluster only, while balancing other clusters across other application controller shards.

@bygui86
Copy link

bygui86 commented Jun 27, 2023

@bofeng96 may I ask you why do you have such requirement?
what if you have hundreds or even thousands ArgoCD App for in-cluster? there you really sharding...

@bofeng96
Copy link

@bofeng96 may I ask you why do you have such requirement? what if you have hundreds or even thousands ArgoCD App for in-cluster? there you really sharding...

We do have thousands of parent Apps on in-cluster, and that's the reason we want the one shard only deal with those apps. It's too heavy if shard 0 is still taking care of other clusters, as the workload on in-cluster is more heavier. The current homogeneous distribution algorithm is only used to assign same # of clusters on each shards.

@bygui86
Copy link

bygui86 commented Jun 28, 2023

@bofeng96 you can still use the not-so-well-documented manually assignment...
As described here, you can set the "shard" in the Cluster K8s Secret. The only requirement is that if you configure N shards, then you must have exactly N application controller replicas.

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster
type: Opaque
data:
  config: base64config
  name: base64name
  server: base64server
  shard: base64shard

clusterShard = sharding.GetShardByID(cluster.ID, replicas)
distributionFunction := sharding.GetDistributionFunction(kubeClient, settingsMgr)
distributionFunction(&cluster)
cluster.Shard = pointer.Int64Ptr(int64(clusterShard))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster.Shard should not be updated. That field is meant for users to explicitly set static shard value so that right application controller instance with that shard value picks it up.

ctrl.projByNameCache.Delete(key)
return true
})
if ctrl != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this nil check really needed ?


return func(c *v1alpha1.Cluster) int {
if c != nil && replicas != 0 {
clusterIndex, ok := clusterIndexdByClusterId[c.ID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clusterIndexdByClusterId is initialized only during the GetShardByIndexModuloReplicasCountDistributionFunction creation. Any clusters added post that initialization would not be available in the map. This map should be dynamically created within the filter function.

case filterFunctionName == "hash":
distributionFunction = GetShardByIndexModuloReplicasCountDistributionFunction(db)
case filterFunctionName == "legacy":
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behaviour should be legacy, So can't we just have 2 cases case filterFunctionName == "hash" and default ? Also it would be better if this logic could be moved to cmd/argocd-application-controller/controllers/argocd_application_control.go where it would be possible to switch the algorithm using command line flags.

return clusters
}

func createClusterIndexByClusterIdMap(db db.ArgoDB) map[string]int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments for the newly added functions ?

if replicas > 0 {
clusterShard = sharding.GetShardByID(cluster.ID, replicas)
distributionFunction := sharding.GetDistributionFunction(argoDB, common.DefaultShardingAlgorithm)
distributionFunction(&cluster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be clusterShard=distributionFunction(&cluster) ?

@bofeng96
Copy link

@bofeng96 you can still use the not-so-well-documented manually assignment... As described here, you can set the "shard" in the Cluster K8s Secret. The only requirement is that if you configure N shards, then you must have exactly N application controller replicas.

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster
type: Opaque
data:
  config: base64config
  name: base64name
  server: base64server
  shard: base64shard

haha. That's the workaround which I've been using. The sweet problem is that I have to edit new Secrets monthly because the # of clusters we managed is increasing.

yyzxw pushed a commit to yyzxw/argo-cd that referenced this pull request Aug 9, 2023
…ernative algorithms (argoproj#13018)

* fix: Extraction of DistributionFunction to allow passing different type of functions to filter clusters by shard
- Adding unit tests for sharding
- Refresh clusters list on DistributionFunction call

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* fix: Incorrect conversion of an integer with architecture-dependent bit size from [strconv.Atoi](1) to a lower bit size type uint32 without an upper bound check.

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* Added config to switch to round-robin sharding

Signed-off-by: Raghavi Shirur <rshirur@redhat.com>
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* Documenting sharding more, adding shuffling tests (skipped), re-enable sharding algo env var

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* Allow configuration through argocd-cmd-params-cm configMap and key: controller.sharding.algorithm

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* De-duplicate code, remove reflection for default case, shorten distribution methods name, ran codegen on manifests
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

---------

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
Signed-off-by: Raghavi Shirur <rshirur@redhat.com>
Co-authored-by: Raghavi Shirur <raghaviss11@gmail.com>
tesla59 pushed a commit to tesla59/argo-cd that referenced this pull request Dec 16, 2023
…ernative algorithms (argoproj#13018)

* fix: Extraction of DistributionFunction to allow passing different type of functions to filter clusters by shard
- Adding unit tests for sharding
- Refresh clusters list on DistributionFunction call

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* fix: Incorrect conversion of an integer with architecture-dependent bit size from [strconv.Atoi](1) to a lower bit size type uint32 without an upper bound check.

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* Added config to switch to round-robin sharding

Signed-off-by: Raghavi Shirur <rshirur@redhat.com>
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* Documenting sharding more, adding shuffling tests (skipped), re-enable sharding algo env var

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* Allow configuration through argocd-cmd-params-cm configMap and key: controller.sharding.algorithm

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

* De-duplicate code, remove reflection for default case, shorten distribution methods name, ran codegen on manifests
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>

---------

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
Signed-off-by: Raghavi Shirur <rshirur@redhat.com>
Co-authored-by: Raghavi Shirur <raghaviss11@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

Argocd-application-controller not sharding per cluster
9 participants