docs: proposal/dynamic rebalancing of clusters across shards #13221

ishitasequeira · 2023-04-13T15:29:55Z

Proposal for dynamic rebalancing of clusters across shards.

Narrowing down the scope of #8340.

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

Please see Contribution FAQs if you have questions about your pull-request.

codecov · 2023-04-13T15:43:37Z

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (0874729) 49.57% compared to head (e59be97) 49.56%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #13221      +/-   ##
==========================================
- Coverage   49.57%   49.56%   -0.01%     
==========================================
  Files         256      256              
  Lines       43920    43920              
==========================================
- Hits        21773    21770       -3     
- Misses      19985    19987       +2     
- Partials     2162     2163       +1

see 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

blakepettersson · 2023-04-13T17:05:00Z

This looks good! One question I have is that I had a proposal in #12123 to be able to scale multiple independent deployments of the application controller, so that say deployment-a of the application controller would be able to reconcile a different set of applications (based upon their AppProject) from deployment-b.

Does this proposal assume that there's only one Deployment of the application controller, or could this be extended to work with multiple independent shard groups?

ishitasequeira · 2023-04-17T11:52:32Z

@blakepettersson Your idea seems to be more around having multiple deployments of application controllers and separating those out based on AppProjects.

This proposal assumes there to be only 1 deployment of ApplicationController. The goal of this proposal is to be able dynamically re-distribute clusters across shards upon addition/removal of shards. I think this proposal should be able to handle all shard groups.

rouke-broersma · 2023-04-20T06:58:39Z

This proposal specifically excludes the problem most faced by Argo users, which is that it's impossible to horizontally scale to deal with many applications in a single cluster.

Would this proposal at least consider controller "load" when rebalancing clusters in the case of multiple clusters?

As I read it now or seems to me that "load" would still be defined as a 'constant' 1 per cluster, so we would still end up with some controllers doing nothing while others are overloaded.

Lastly is this meant as an intermediate improvement so the other asks are easier to design and implement or is this the only part of the problem argo is currently willing to solve?

blakepettersson · 2023-04-20T08:06:36Z

@blakepettersson Your idea seems to be more around having multiple deployments of application controllers and separating those out based on AppProjects.

Yes indeed, I was using that as a specific example of how it could be useful to be able to support the use case of being able to have multiple independent deployments of the ApplicationController, the idea itself is not so relevant for the proposal 😄

I think this proposal should be able to handle all shard groups.

What I meant about shard group (or assume rather) is that a group of shards is handled by a single deployment of ApplicationController (then how that is sharded internally is another question, and handled in this proposal). From what I can read in your reply is that this proposal intends to only support having a single deployment of the application controller.

ishitasequeira · 2023-04-20T18:39:43Z

This proposal specifically excludes the problem most faced by Argo users, which is that it's impossible to horizontally scale to deal with many applications in a single cluster.

Would this proposal at least consider controller "load" when rebalancing clusters in the case of multiple clusters?

As I read it now or seems to me that "load" would still be defined as a 'constant' 1 per cluster, so we would still end up with some controllers doing nothing while others are overloaded.

@rouke-broersma , Yes, the current proposal is not focusing on load and is trying to balance clusters across shards for dynamic scaling. We are not including load on the shards as of now.

Lastly is this meant as an intermediate improvement so the other asks are easier to design and implement or is this the only part of the problem argo is currently willing to solve?

Yes, This proposal is an intermediate improvement to be able to move towards including load for scaling.

ishitasequeira · 2023-04-20T18:43:49Z

What I meant about shard group (or assume rather) is that a group of shards is handled by a single deployment of ApplicationController (then how that is sharded internally is another question, and handled in this proposal). From what I can read in your reply is that this proposal intends to only support having a single deployment of the application controller.

Yes, that is correct. We assume there to be a single deployment of application controller.

leoluz · 2023-05-11T18:45:32Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+Each shard replica knows about the total number of available shards by evaluating the environment variable ARGOCD_CONTROLLER_REPLICAS, which needs to be kept up-to-date with the actual number of available replicas (shards). If the number of replicas does not equal the number set in ARGOCD_CONTROLLER_REPLICAS, sharding will not work as intended, leading to both, unused and overused replicas. As this environment variable is set on the StatefulSet and propagated to the pods, all the pods in the StatefulSet need to be restarted in order to pick up the new number of total shards.
+
+The current sharding mechanism relies on predictable pod names for the application controller to determine which shard a given replica should impersonate, e.g. the first replica of the StatefulSet (argocd-application-controller-0) will be the first shard, the second replica (argocd-application-controller-1) will be the second and so forth. 


Correct. This is the current approach in order to guarantee that certain clusters will always be associated with certain replicas by applying a deterministic logic during cluster assignment.

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

leoluz · 2023-05-11T19:41:08Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+How will this work? 
+* The application controller will query Redis for the status of all the application controllers and last updated heartbeat timestamp.
+* It will check if any application controller is flagged as Unhealthy or has not updated its status in Redis during the heartbeat process for a certain period of time.
+* If the status for an application controller was already flagged as Unhealthy, it will pick one of the clusters from its list and assign the cluster to itself.


I am afraid that this approach will lead to race conditions issues. Maybe this should be done by all applications controllers but having just one dealing with analyzing heartbeat and redistributing clusters.

Wouldn't you then have the issue that if that controller goes down you no longer have a mechanism to perform the healthcheck, so the health check controller never gets marked as unhealthy?

I agree with @rouke-broersma here.. Relying on a single controller has a bunch of issues as well, for example, if the controller performing the redistribution goes down or starts to distribute in a faulty manner.

Here I agree with @leoluz about race conditions. Controller health might be flapping and it is hard to predict what happens sooner: controller instance recovers or the shards are redistributed among healthy controllers.

I'm proposing to just don't detect unhealthy clusters and let kubenetes/operators deal with it:

healthy controllers just "claim" shards and update ConfigMap

each controller should have a readiness probe that returns healthy only if the controller successfully claimed shard and updated heartbeat within the allowed timeframe.

So if one controller instance is unhealthy one shard won't be processed but operator will know about it because readiness probe would be failing.

Thanks @alexmt, the suggestion of not detecting unhealthy shards does make sense.. In this case, we would need to re-introduce the liveness probe or update the readiness probe with the condition where we check whether the controller was able to claim a shard or not.

leoluz · 2023-05-11T19:43:33Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+* If the status for an application controller was already flagged as Unhealthy, it will pick one of the clusters from its list and assign the cluster to itself.
+* If the status is not flagged and an application controller has not updated the last active timestamp in a long time, then we mark the Application Controller as Unhealthy. 
+
+This will continue to happen till the time the list of clusters is marked as empty. As soon as the list is marked as empty, the entry of the shard/application-controller is removed from Redis.


What if in the middle of a cluster redistribution the failed application controller heals and starts updating the heartbeat again?

In that scenario, the redistribution will kick in again as the number of healthy controllers will increase again.

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

leoluz · 2023-05-11T20:01:21Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+### Cons
+
+* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.


I just noticed that you considered race conditions as well. :)
I would be opposed to using distributed locks as it increases complexity quite a bit and we already have stability issues with our Redis HA setup.
There are many mixed feelings about distributed locks being an anti-pattern (which I agree).
Example:
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

leoluz · 2023-05-11T20:05:43Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.
+
+* In scenarios when Redis becomes unavailable, the heartbeat mechanism will pause working till the redis comes back online again. This will also pause the dynamic redistribution of clusters till Redis comes back online. The redistribution of clusters will be triggered again when Redis comes back online.


Maybe we could rely less in Redis if we had leader election in place. In this case we could instead, maintain an InMemory cache of the sharding state. This would be synced in Redis mainly so it can be picked by the new leader in case of current leader crashes.

How about we use configmaps for this instead of redis, k8s would prevent concurrent updates to the configmaps so we can rely on that for handling race conditions while we also remove the issue of relying on redis too much.

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

leoluz · 2023-05-11T20:10:21Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+### Pros of Leader Election
+
+* We can refrain from performing multiple calls to Redis about the load and status of the shards and store it in a local cache within the leader while updating data in Redis on a timely manner (for e.g. every 10 mins). 


Duplicated sentence with line 134

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

alexmt · 2023-05-17T05:15:23Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+Inorder to be able to accurately know which shards are being managed by which application-controller, especially in scenarios of redistribution of load, addition/removal of `application controller`, etc., we would need to have a mechanism to assign clusters to the shards. 
+
+The service account used by the application controller has read access to all the resources in the cluster. Thus, instead of setting the environment variable ARGOCD_CONTROLLER_REPLICAS representing the number of replicas, the number of replicas can be read directly from the number of healthy replicas of the application controller deployment.


The application controller service account does not necessarily has access to all resources in the cluster.
Some users install controller with only argocd-application-controller-role and use it to manage remote clusters only.

So I suggest changing argocd-application-controller-role and allowing controller inspect own deployment and find out a number of replicas.

That's a good point. I will add this point to the document.

alexmt · 2023-05-17T05:23:16Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+### Cons
+
+* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.


I'm also concerned about using Redis. Some users prefer cloud-hosted Redis (e.g. Amazon ElastiCache). So in this case controller would need to rely on external resource to work. HA Redis also has some known issues: under heavy load, master might be unable to perform replication quick enough and only known solution is to kill all redis pods. So we would lose information about shards.

So instead of redis I would suggest to just use ConfigMap. It is more stable and we can leverage Kubernetes conflict errors to deal with race conditions.

Isn't this what https://kubernetes.io/docs/concepts/architecture/leases/#custom-workload are for? Or is that too limited for the information you would need to store?

I like the idea of using ConfigMap's to reduce dependency on Redis. I will incorporate ConfigMaps in the proposal.

alexmt

Added one comment about preserving backward compatibility, but we can discuss more in the PR that implements the feature.

LGTM!

alexmt · 2023-06-08T15:53:28Z

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

+
+### Security Considerations
+
+* This would be a breaking change of converting StatefulSets to Deployments. Any automation done by customers which is based on the assumption that the controller is modelled as a StatefulSet would break with this change. 


I think we could easily present backward compatibility with StatefulSet: controller could check if HOSTNAME matches regex argocd-application-controller-\d than is can just infer shard number from hostname, otherwise use the new logic.

ishitasequeira · 2023-06-12T05:47:29Z

@alexmt thanks for the approval. We could surely discuss the point of backwards compatibility when I have a draft PR out for this proposal's implementation.

Wanted to check if we could we get the proposal merged as well?

jannfis

LGTM

docs/proposals/rebalancing-clusters-across-shards-dynamically.md

leoluz

LGTM
Tks Ishita!

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

…or unhealthy controllers Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

…j#13221) * Add proposal document for dynamic rebalancing of clusters across shards Signed-off-by: ishitasequeira <ishiseq29@gmail.com> * Use ConfigMap instead of Redis based on the feedback on the proposal Signed-off-by: ishitasequeira <ishiseq29@gmail.com> * address comments and add comments about not redistributing clusters for unhealthy controllers Signed-off-by: ishitasequeira <ishiseq29@gmail.com> * Address comments Signed-off-by: ishitasequeira <ishiseq29@gmail.com> --------- Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

ishitasequeira changed the title ~~[WIP] proposal: add proposal document for dynamic rebalancing of clusters across shards~~ proposal: add proposal document for dynamic rebalancing of clusters across shards [WIP] Apr 13, 2023

ishitasequeira changed the title ~~proposal: add proposal document for dynamic rebalancing of clusters across shards [WIP]~~ docs: proposal/add proposal document for dynamic rebalancing of clusters across shards [WIP] Apr 13, 2023

ishitasequeira changed the title ~~docs: proposal/add proposal document for dynamic rebalancing of clusters across shards [WIP]~~ docs: proposal/dynamic rebalancing of clusters across shards [WIP] Apr 13, 2023

blakepettersson mentioned this pull request Apr 13, 2023

docs: Proposal for multi-tenant repository servers #12123

Draft

10 tasks

ishitasequeira marked this pull request as ready for review April 17, 2023 17:14

ishitasequeira mentioned this pull request Apr 20, 2023

docs: proposal for sharding application controller #8340

Closed

10 tasks

ishitasequeira changed the title ~~docs: proposal/dynamic rebalancing of clusters across shards [WIP]~~ docs: proposal/dynamic rebalancing of clusters across shards Apr 23, 2023

leoluz reviewed May 11, 2023

View reviewed changes

docs/proposals/rebalancing-clusters-across-shards-dynamically.md Show resolved Hide resolved

leoluz reviewed May 11, 2023

View reviewed changes

docs/proposals/rebalancing-clusters-across-shards-dynamically.md Show resolved Hide resolved

leoluz reviewed May 11, 2023

View reviewed changes

docs/proposals/rebalancing-clusters-across-shards-dynamically.md Show resolved Hide resolved

leoluz reviewed May 11, 2023

View reviewed changes

docs/proposals/rebalancing-clusters-across-shards-dynamically.md Show resolved Hide resolved

leoluz reviewed May 11, 2023

View reviewed changes

alexmt reviewed May 17, 2023

View reviewed changes

docs/proposals/rebalancing-clusters-across-shards-dynamically.md Show resolved Hide resolved

alexmt reviewed May 17, 2023

View reviewed changes

ishitasequeira requested a review from alexmt May 23, 2023 17:32

alexmt approved these changes Jun 8, 2023

View reviewed changes

jannfis approved these changes Jun 12, 2023

View reviewed changes

leoluz reviewed Jun 12, 2023

View reviewed changes

docs/proposals/rebalancing-clusters-across-shards-dynamically.md Outdated Show resolved Hide resolved

leoluz approved these changes Jun 12, 2023

View reviewed changes

crenshaw-dev enabled auto-merge (squash) June 12, 2023 21:34

ishitasequeira added 4 commits June 12, 2023 20:56

Add proposal document for dynamic rebalancing of clusters across shards

e79329b

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

Use ConfigMap instead of Redis based on the feedback on the proposal

7b7b9fb

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

address comments and add comments about not redistributing clusters f…

14e28b8

…or unhealthy controllers Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

Address comments

e59be97

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

auto-merge was automatically disabled June 13, 2023 00:56
Head branch was pushed to by a user without write access

leoluz merged commit 36e292c into argoproj:master Jun 13, 2023

blakepettersson mentioned this pull request Jul 12, 2023

Dynamically scale argocd-application-controller (Using HPA) #14428

Open

crenshaw-dev mentioned this pull request Aug 1, 2023

Load between controllers (argocd-application-controller) is not evenly distributed #6125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: proposal/dynamic rebalancing of clusters across shards #13221

docs: proposal/dynamic rebalancing of clusters across shards #13221

ishitasequeira commented Apr 13, 2023 •

edited

Loading

codecov bot commented Apr 13, 2023 •

edited

Loading

blakepettersson commented Apr 13, 2023

ishitasequeira commented Apr 17, 2023

rouke-broersma commented Apr 20, 2023 •

edited

Loading

blakepettersson commented Apr 20, 2023

ishitasequeira commented Apr 20, 2023

ishitasequeira commented Apr 20, 2023

leoluz May 11, 2023

leoluz May 11, 2023

rouke-broersma May 11, 2023

ishitasequeira May 18, 2023

alexmt May 26, 2023

ishitasequeira May 31, 2023

leoluz May 11, 2023

ishitasequeira May 18, 2023

leoluz May 11, 2023 •

edited

Loading

leoluz May 11, 2023

gdsoumya May 17, 2023

leoluz May 11, 2023

alexmt May 17, 2023

ishitasequeira May 18, 2023

alexmt May 17, 2023

rouke-broersma May 17, 2023

ishitasequeira May 18, 2023

alexmt left a comment

alexmt Jun 8, 2023

ishitasequeira commented Jun 12, 2023

jannfis left a comment

leoluz left a comment


		Each shard replica knows about the total number of available shards by evaluating the environment variable ARGOCD_CONTROLLER_REPLICAS, which needs to be kept up-to-date with the actual number of available replicas (shards). If the number of replicas does not equal the number set in ARGOCD_CONTROLLER_REPLICAS, sharding will not work as intended, leading to both, unused and overused replicas. As this environment variable is set on the StatefulSet and propagated to the pods, all the pods in the StatefulSet need to be restarted in order to pick up the new number of total shards.

		The current sharding mechanism relies on predictable pod names for the application controller to determine which shard a given replica should impersonate, e.g. the first replica of the StatefulSet (argocd-application-controller-0) will be the first shard, the second replica (argocd-application-controller-1) will be the second and so forth.


		### Cons

		* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.


		* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.

		* In scenarios when Redis becomes unavailable, the heartbeat mechanism will pause working till the redis comes back online again. This will also pause the dynamic redistribution of clusters till Redis comes back online. The redistribution of clusters will be triggered again when Redis comes back online.


		### Pros of Leader Election

		* We can refrain from performing multiple calls to Redis about the load and status of the shards and store it in a local cache within the leader while updating data in Redis on a timely manner (for e.g. every 10 mins).


		Inorder to be able to accurately know which shards are being managed by which application-controller, especially in scenarios of redistribution of load, addition/removal of `application controller`, etc., we would need to have a mechanism to assign clusters to the shards.

		The service account used by the application controller has read access to all the resources in the cluster. Thus, instead of setting the environment variable ARGOCD_CONTROLLER_REPLICAS representing the number of replicas, the number of replicas can be read directly from the number of healthy replicas of the application controller deployment.


		### Security Considerations

		* This would be a breaking change of converting StatefulSets to Deployments. Any automation done by customers which is based on the assumption that the controller is modelled as a StatefulSet would break with this change.

docs: proposal/dynamic rebalancing of clusters across shards #13221

docs: proposal/dynamic rebalancing of clusters across shards #13221

Conversation

ishitasequeira commented Apr 13, 2023 • edited Loading

codecov bot commented Apr 13, 2023 • edited Loading

Codecov Report

blakepettersson commented Apr 13, 2023

ishitasequeira commented Apr 17, 2023

rouke-broersma commented Apr 20, 2023 • edited Loading

blakepettersson commented Apr 20, 2023

ishitasequeira commented Apr 20, 2023

ishitasequeira commented Apr 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leoluz May 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexmt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ishitasequeira commented Jun 12, 2023

jannfis left a comment

Choose a reason for hiding this comment

leoluz left a comment

Choose a reason for hiding this comment

ishitasequeira commented Apr 13, 2023 •

edited

Loading

codecov bot commented Apr 13, 2023 •

edited

Loading

rouke-broersma commented Apr 20, 2023 •

edited

Loading

leoluz May 11, 2023 •

edited

Loading