Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for multi-cluster with the same API Url #10897

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

denysvitali
Copy link

@denysvitali denysvitali commented Oct 11, 2022

This PR changes the way ArgoCD uses the clusters.

Rationale

For some use cases (including ours), an user might want to use the same Kubernetes API Server multiple times, with a different set of credentials.

Let me start with an example.

Given:

  • A Kubernetes Cluster with the Kubernetes API Server URL https://kubernetes.corp.example.com and three namespaces
    • dev,
    • staging
    • prod
  • Three different accounts (credentials) that are scoped to their respective namespaces:
    • acc-dev
    • acc-staging
    • acc-prod
  • An ArgoCD deployment running the latest version as of today (v2.5.3)

It is currently not possible to set-up three accounts in a way that they all use the same API Server URL, due to the fact that ArgoCD internally (and in the UI) prioritizes the API Server URL over the Cluster Name.

When viewing the Cluster details in ArgoCD UI, the UI jumps between the 3 clusters (because there is a 1s automatic refresh and the API returns the first match in a non-deterministic way):
https://argocd.corp.example.com/settings/clusters/https%3A%2F%2Fkubernetes.corp.example.com

This PR fixes all of these issues by using only the cluster name (in-cluster, cluster-dev, cluster-staging, cluster-prod) as an identifier for both the sync and the UI.

This PR closes #9606.
Comments and modifications are really appreciated! Thanks!

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).

@codecov
Copy link

codecov bot commented Oct 11, 2022

Codecov Report

Base: 46.88% // Head: 46.85% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (125c30a) compared to base (981a1f1).
Patch coverage: 44.11% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10897      +/-   ##
==========================================
- Coverage   46.88%   46.85%   -0.03%     
==========================================
  Files         240      240              
  Lines       39911    39936      +25     
==========================================
+ Hits        18712    18714       +2     
- Misses      19306    19328      +22     
- Partials     1893     1894       +1     
Impacted Files Coverage Δ
cmd/argocd/commands/admin/cluster.go 0.00% <0.00%> (ø)
controller/appcontroller.go 52.33% <0.00%> (ø)
controller/cache/cache.go 19.48% <0.00%> (ø)
server/application/application.go 28.61% <0.00%> (ø)
server/application/terminal.go 15.76% <0.00%> (ø)
util/db/helmrepository.go 0.00% <0.00%> (ø)
util/settings/settings.go 48.87% <ø> (ø)
util/db/cluster.go 56.62% <32.00%> (-2.87%) ⬇️
util/argo/argo.go 61.76% <36.36%> (-0.65%) ⬇️
server/cluster/cluster.go 37.98% <50.00%> (ø)
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@denysvitali denysvitali force-pushed the feature/add-multi-cluster-serverurl-support branch from 3fbc391 to f0101ae Compare November 30, 2022 14:20
@denysvitali
Copy link
Author

We've updated the PR and consider the feature to be completed. Can we please have a review?

Ping @crenshaw-dev @alexmt @jessesuen

Denys Vitali added 7 commits November 30, 2022 16:56
This is the first commit towards addressing argoproj#9606

Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
…ange the proxy origin

Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
@denysvitali denysvitali force-pushed the feature/add-multi-cluster-serverurl-support branch from f0101ae to c4d3fd4 Compare November 30, 2022 15:57
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
@denysvitali denysvitali force-pushed the feature/add-multi-cluster-serverurl-support branch from 76aeb31 to f711628 Compare November 30, 2022 16:15
Signed-off-by: Denys Vitali <denys.vitali@swisscom.com>
@denysvitali
Copy link
Author

@crenshaw-dev , firendly ping :)

@leptitchriss
Copy link

Thank you @denysvitali for this fix, we have the exact same problem and believe this will solve our multi-tenancy problem.

@denysvitali
Copy link
Author

@jannfis @alexmt friendly ping for a review (:

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @denysvitali, thanks a lot for trying to tackle this issue and for your patience. There's a lot of PR backlog right now, we're very sorry about the delay.

I have a few concerns with this PR as-is, because as far as I can tell, it makes the cluster name a mandatory field. If this assumption is correct, it means that the PR will introduce a majorly breaking change, because the cluster name is currently an optional property of any cluster connection and may not be set in many configurations.

Also, what effects will this have on the cluster cache? Currently, we use the cluster's URL as the primary key for cluster caches. I have not seen a change to this logic in the PR, so I assume that this still holds true. Because many of the cluster settings are defined within the cluster cache, I have the concern that this ends up in a scenario where we have multiple clusters with the same URL but a different name, but only a single cluster cache configuration?

@@ -2,12 +2,19 @@

## Preface

The Argo CD project continuously grows, both in terms of features and community size. It gets adopted by more and more organisations which entrust Argo CD to handle their critical production workloads. Thus, we need to take great care with any changes that affect compatibility, performance, scalability, stability and security of Argo CD. For this reason, every new feature or larger enhancement must be properly designed and discussed before it gets accepted into the code base.
The Argo CD project continuously grows, both in terms of features and community size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file, have there been any changes except line breaks?

Can you please revert the changes of the line breaks?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! You're correct I've only changed the line breaks (:

@@ -2,22 +2,27 @@

In this guide, we will describe how to debug a remote ArgoCD environment with [Telepresence](https://telepresence.io/).

Telepresence allows you to connect & debug a service deployed in a remote environment and to "cherry-pick" one service to run locally, staying connected to the remote cluster. This will:
Telepresence allows you to connect & debug a service deployed in a remote environment and to "cherry-pick"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment - please revert changes, at least the line breaks.

@@ -2,11 +2,14 @@

## Run Argo CD outside of Kubernetes

During development, it might be viable to run Argo CD outside of a Kubernetes cluster. This will greatly speed up development, as you don't have to constantly build, push and install new Argo CD Docker images with your latest changes.
During development, it might be viable to run Argo CD outside a Kubernetes cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert line break changes.

@@ -850,7 +850,7 @@ func (a *ArgoCDServer) newHTTPServer(ctx context.Context, port int, grpcWebHandl
// NOTE: if a marshaller option is not supplied, grpc-gateway will default to the jsonpb from
// golang/protobuf. Which does not support types such as time.Time. gogo/protobuf does support
// time.Time, but does not support custom UnmarshalJSON() and MarshalJSON() methods. Therefore
// we use our own Marshaler
// we use our own Marshaller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this is just a difference between BE and AE, both spellings (single vs double l) are actually valid.

<DataLoader ref={loaderRef} input={server} load={(url: string) => timer(0, 1000).pipe(mergeMap(() => from(services.clusters.get(url, ''))))}>
<DataLoader ref={loaderRef} input={server} load={(name: string) => timer(0, 5000).pipe(mergeMap(() => from(services.clusters.getByName(name))))}>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change mean, cluster's name field is now mandatory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, because the Cluster API URL is not unique anymore, so we use the Cluster Name.
We could eventually make it unique without breaking compatibility by relying on Kubernetes name+namespace uniqueness by following what Traefik does: encoding the name into something like resource-name@namespace

@jannfis
Copy link
Member

jannfis commented Jan 20, 2023

@denysvitali So I've taken this PR to this weeks contributor's meeting to discuss it with fellow maintainers. While we all think this feature is worth pursuing and eventually want it implemented, we feel that it may require some more thought and design.

We appreciate the work you already put into this, but would like to kindly ask you to submit an enhancement proposal which captures all the important design decisions before we progress on the code any further. After we have agreed on how the change will look like, we'd be more than happy to see the contribution to the code made by you, and will help you bring this PR to a state where it eventually be merged.

WDYT of this?

@denysvitali
Copy link
Author

Sounds awesome! I'll do that!
Thank you for bringing it up!

@denysvitali
Copy link
Author

FYI, just created proposal 004.
You can check #12755

@yellow-straw-hat
Copy link

Hello @jannfis, are there any updates about this ?

This becomes a security concern in our organization. We need to add n clusters having the same url but with a different configuration (each project team works on a different namespace)

Many companies chose the kubernetes namespace isolation feature to manage authorization for each team. And imo argocd should be flexible enough to follow the k8s logic.

@Interaze
Copy link

Would love to see this get approved

@blakepettersson
Copy link
Member

#14255 might be of interest to everyone that's checking this PR

@denysvitali
Copy link
Author

#14255 could indeed be a solution for us because at the moment we're trying to adapt the concept of "clusters" to allow us to do exactly that - using different service accounts for different namespaces.

With that proposal, instead of tackling the problem from the cluster perspective (trust me, it's ugly, we have N+1 clusters for N namespaces and the control plane on another K8s cluster) we solve it by specifying which service accounts are used for the sync.

I second that proposall, as it seems to solve another problem that my proposal has: multiple ArgoCD clusters to represent the same Kubernetes cluster.

If everyone in this thread agrees, we can simply close this proposal in favor of #14255.
I'll still have to read the proposal in depth, but the general idea seems a better implementation of my solution :)

@Interaze
Copy link

I've found a solution for my problem in scoping namespaces to the project. If we could have something that allows users to move away from the "one credential to rule all" I'd be all for it

@YDamree
Copy link

YDamree commented Jul 12, 2023

Hello,
Is there any chance to merge this PR on the main branch some day please ?
It would be really helpful for our project :)
Thanks a lot in advance

@blakepettersson
Copy link
Member

@YDamree I'd say that the chances are quite low, TBH. A practical implementation of #14255 is more likely to be merged, and unless the spec changes significantly I'd say it's relatively straightforward to implement.

@ButtonDev
Copy link

is there any plan to merge this? i have the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple clusters with the same URL and different token may not be synchronized
8 participants