Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: proposal/dynamic rebalancing of clusters across shards #13221

Merged
merged 4 commits into from
Jun 13, 2023
Merged

docs: proposal/dynamic rebalancing of clusters across shards #13221

merged 4 commits into from
Jun 13, 2023

Conversation

ishitasequeira
Copy link
Member

@ishitasequeira ishitasequeira commented Apr 13, 2023

Proposal for dynamic rebalancing of clusters across shards.

Narrowing down the scope of #8340.

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

Please see Contribution FAQs if you have questions about your pull-request.

@ishitasequeira ishitasequeira changed the title [WIP] proposal: add proposal document for dynamic rebalancing of clusters across shards proposal: add proposal document for dynamic rebalancing of clusters across shards [WIP] Apr 13, 2023
@ishitasequeira ishitasequeira changed the title proposal: add proposal document for dynamic rebalancing of clusters across shards [WIP] docs: proposal/add proposal document for dynamic rebalancing of clusters across shards [WIP] Apr 13, 2023
@ishitasequeira ishitasequeira changed the title docs: proposal/add proposal document for dynamic rebalancing of clusters across shards [WIP] docs: proposal/dynamic rebalancing of clusters across shards [WIP] Apr 13, 2023
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (0874729) 49.57% compared to head (e59be97) 49.56%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13221      +/-   ##
==========================================
- Coverage   49.57%   49.56%   -0.01%     
==========================================
  Files         256      256              
  Lines       43920    43920              
==========================================
- Hits        21773    21770       -3     
- Misses      19985    19987       +2     
- Partials     2162     2163       +1     

see 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@blakepettersson
Copy link
Member

This looks good! One question I have is that I had a proposal in #12123 to be able to scale multiple independent deployments of the application controller, so that say deployment-a of the application controller would be able to reconcile a different set of applications (based upon their AppProject) from deployment-b.

Does this proposal assume that there's only one Deployment of the application controller, or could this be extended to work with multiple independent shard groups?

@ishitasequeira
Copy link
Member Author

@blakepettersson Your idea seems to be more around having multiple deployments of application controllers and separating those out based on AppProjects.

This proposal assumes there to be only 1 deployment of ApplicationController. The goal of this proposal is to be able dynamically re-distribute clusters across shards upon addition/removal of shards. I think this proposal should be able to handle all shard groups.

@ishitasequeira ishitasequeira marked this pull request as ready for review April 17, 2023 17:14
@rouke-broersma
Copy link
Contributor

rouke-broersma commented Apr 20, 2023

This proposal specifically excludes the problem most faced by Argo users, which is that it's impossible to horizontally scale to deal with many applications in a single cluster.

Would this proposal at least consider controller "load" when rebalancing clusters in the case of multiple clusters?

As I read it now or seems to me that "load" would still be defined as a 'constant' 1 per cluster, so we would still end up with some controllers doing nothing while others are overloaded.

Lastly is this meant as an intermediate improvement so the other asks are easier to design and implement or is this the only part of the problem argo is currently willing to solve?

@blakepettersson
Copy link
Member

@blakepettersson Your idea seems to be more around having multiple deployments of application controllers and separating those out based on AppProjects.

Yes indeed, I was using that as a specific example of how it could be useful to be able to support the use case of being able to have multiple independent deployments of the ApplicationController, the idea itself is not so relevant for the proposal 😄

I think this proposal should be able to handle all shard groups.

What I meant about shard group (or assume rather) is that a group of shards is handled by a single deployment of ApplicationController (then how that is sharded internally is another question, and handled in this proposal). From what I can read in your reply is that this proposal intends to only support having a single deployment of the application controller.

@ishitasequeira
Copy link
Member Author

This proposal specifically excludes the problem most faced by Argo users, which is that it's impossible to horizontally scale to deal with many applications in a single cluster.

Would this proposal at least consider controller "load" when rebalancing clusters in the case of multiple clusters?

As I read it now or seems to me that "load" would still be defined as a 'constant' 1 per cluster, so we would still end up with some controllers doing nothing while others are overloaded.

@rouke-broersma , Yes, the current proposal is not focusing on load and is trying to balance clusters across shards for dynamic scaling. We are not including load on the shards as of now.

Lastly is this meant as an intermediate improvement so the other asks are easier to design and implement or is this the only part of the problem argo is currently willing to solve?

Yes, This proposal is an intermediate improvement to be able to move towards including load for scaling.

@ishitasequeira
Copy link
Member Author

What I meant about shard group (or assume rather) is that a group of shards is handled by a single deployment of ApplicationController (then how that is sharded internally is another question, and handled in this proposal). From what I can read in your reply is that this proposal intends to only support having a single deployment of the application controller.

Yes, that is correct. We assume there to be a single deployment of application controller.

@ishitasequeira ishitasequeira changed the title docs: proposal/dynamic rebalancing of clusters across shards [WIP] docs: proposal/dynamic rebalancing of clusters across shards Apr 23, 2023

Each shard replica knows about the total number of available shards by evaluating the environment variable ARGOCD_CONTROLLER_REPLICAS, which needs to be kept up-to-date with the actual number of available replicas (shards). If the number of replicas does not equal the number set in ARGOCD_CONTROLLER_REPLICAS, sharding will not work as intended, leading to both, unused and overused replicas. As this environment variable is set on the StatefulSet and propagated to the pods, all the pods in the StatefulSet need to be restarted in order to pick up the new number of total shards.

The current sharding mechanism relies on predictable pod names for the application controller to determine which shard a given replica should impersonate, e.g. the first replica of the StatefulSet (argocd-application-controller-0) will be the first shard, the second replica (argocd-application-controller-1) will be the second and so forth.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is the current approach in order to guarantee that certain clusters will always be associated with certain replicas by applying a deterministic logic during cluster assignment.

How will this work?
* The application controller will query Redis for the status of all the application controllers and last updated heartbeat timestamp.
* It will check if any application controller is flagged as Unhealthy or has not updated its status in Redis during the heartbeat process for a certain period of time.
* If the status for an application controller was already flagged as Unhealthy, it will pick one of the clusters from its list and assign the cluster to itself.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this approach will lead to race conditions issues. Maybe this should be done by all applications controllers but having just one dealing with analyzing heartbeat and redistributing clusters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you then have the issue that if that controller goes down you no longer have a mechanism to perform the healthcheck, so the health check controller never gets marked as unhealthy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rouke-broersma here.. Relying on a single controller has a bunch of issues as well, for example, if the controller performing the redistribution goes down or starts to distribute in a faulty manner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I agree with @leoluz about race conditions. Controller health might be flapping and it is hard to predict what happens sooner: controller instance recovers or the shards are redistributed among healthy controllers.

I'm proposing to just don't detect unhealthy clusters and let kubenetes/operators deal with it:

  • healthy controllers just "claim" shards and update ConfigMap
  • each controller should have a readiness probe that returns healthy only if the controller successfully claimed shard and updated heartbeat within the allowed timeframe.

So if one controller instance is unhealthy one shard won't be processed but operator will know about it because readiness probe would be failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexmt, the suggestion of not detecting unhealthy shards does make sense.. In this case, we would need to re-introduce the liveness probe or update the readiness probe with the condition where we check whether the controller was able to claim a shard or not.

* If the status for an application controller was already flagged as Unhealthy, it will pick one of the clusters from its list and assign the cluster to itself.
* If the status is not flagged and an application controller has not updated the last active timestamp in a long time, then we mark the Application Controller as Unhealthy.

This will continue to happen till the time the list of clusters is marked as empty. As soon as the list is marked as empty, the entry of the shard/application-controller is removed from Redis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if in the middle of a cluster redistribution the failed application controller heals and starts updating the heartbeat again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that scenario, the redistribution will kick in again as the number of healthy controllers will increase again.


### Cons

* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.
Copy link
Collaborator

@leoluz leoluz May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that you considered race conditions as well. :)
I would be opposed to using distributed locks as it increases complexity quite a bit and we already have stability issues with our Redis HA setup.
There are many mixed feelings about distributed locks being an anti-pattern (which I agree).
Example:
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html


* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.

* In scenarios when Redis becomes unavailable, the heartbeat mechanism will pause working till the redis comes back online again. This will also pause the dynamic redistribution of clusters till Redis comes back online. The redistribution of clusters will be triggered again when Redis comes back online.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could rely less in Redis if we had leader election in place. In this case we could instead, maintain an InMemory cache of the sharding state. This would be synced in Redis mainly so it can be picked by the new leader in case of current leader crashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use configmaps for this instead of redis, k8s would prevent concurrent updates to the configmaps so we can rely on that for handling race conditions while we also remove the issue of relying on redis too much.


### Pros of Leader Election

* We can refrain from performing multiple calls to Redis about the load and status of the shards and store it in a local cache within the leader while updating data in Redis on a timely manner (for e.g. every 10 mins).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated sentence with line 134


Inorder to be able to accurately know which shards are being managed by which application-controller, especially in scenarios of redistribution of load, addition/removal of `application controller`, etc., we would need to have a mechanism to assign clusters to the shards.

The service account used by the application controller has read access to all the resources in the cluster. Thus, instead of setting the environment variable ARGOCD_CONTROLLER_REPLICAS representing the number of replicas, the number of replicas can be read directly from the number of healthy replicas of the application controller deployment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The application controller service account does not necessarily has access to all resources in the cluster.
Some users install controller with only argocd-application-controller-role and use it to manage remote clusters only.

So I suggest changing argocd-application-controller-role and allowing controller inspect own deployment and find out a number of replicas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I will add this point to the document.


### Cons

* Possibility of race conditions while flagging the shard as Unhealthy during the heartbeat process. Although this can be handled using the [distributed locks](https://redis.io/docs/manual/patterns/distributed-locks/) in Redis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also concerned about using Redis. Some users prefer cloud-hosted Redis (e.g. Amazon ElastiCache). So in this case controller would need to rely on external resource to work. HA Redis also has some known issues: under heavy load, master might be unable to perform replication quick enough and only known solution is to kill all redis pods. So we would lose information about shards.

So instead of redis I would suggest to just use ConfigMap. It is more stable and we can leverage Kubernetes conflict errors to deal with race conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this what https://kubernetes.io/docs/concepts/architecture/leases/#custom-workload are for? Or is that too limited for the information you would need to store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using ConfigMap's to reduce dependency on Redis. I will incorporate ConfigMaps in the proposal.

@ishitasequeira ishitasequeira requested a review from alexmt May 23, 2023 17:32
Copy link
Collaborator

@alexmt alexmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one comment about preserving backward compatibility, but we can discuss more in the PR that implements the feature.

LGTM!


### Security Considerations

* This would be a breaking change of converting StatefulSets to Deployments. Any automation done by customers which is based on the assumption that the controller is modelled as a StatefulSet would break with this change.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could easily present backward compatibility with StatefulSet: controller could check if HOSTNAME matches regex argocd-application-controller-\d than is can just infer shard number from hostname, otherwise use the new logic.

@ishitasequeira
Copy link
Member Author

@alexmt thanks for the approval. We could surely discuss the point of backwards compatibility when I have a draft PR out for this proposal's implementation.

Wanted to check if we could we get the proposal merged as well?

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tks Ishita!

@crenshaw-dev crenshaw-dev enabled auto-merge (squash) June 12, 2023 21:34
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
…or unhealthy controllers

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
auto-merge was automatically disabled June 13, 2023 00:56

Head branch was pushed to by a user without write access

@leoluz leoluz merged commit 36e292c into argoproj:master Jun 13, 2023
yyzxw pushed a commit to yyzxw/argo-cd that referenced this pull request Aug 9, 2023
…j#13221)

* Add proposal document for dynamic rebalancing of clusters across shards

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* Use ConfigMap instead of Redis based on the feedback on the proposal

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* address comments and add comments about not redistributing clusters for unhealthy controllers

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* Address comments

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

---------

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
tesla59 pushed a commit to tesla59/argo-cd that referenced this pull request Dec 16, 2023
…j#13221)

* Add proposal document for dynamic rebalancing of clusters across shards

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* Use ConfigMap instead of Redis based on the feedback on the proposal

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* address comments and add comments about not redistributing clusters for unhealthy controllers

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

* Address comments

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>

---------

Signed-off-by: ishitasequeira <ishiseq29@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants