feat: respecting rbac for resource exclusions/inclusions proposal #13479

gdsoumya · 2023-05-06T04:43:24Z

This PR adds the proposal for adding a new feature that allows argocd controller to respect rbac while monitoring for resources besides existing resource exclusions/inclusions.

Checklist:

Please see Contribution FAQs if you have questions about your pull-request.

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

codecov · 2023-05-06T04:56:17Z

Codecov Report

Patch coverage has no change and project coverage change: +0.07 🎉

Comparison is base (bbc51fb) 49.56% compared to head (1e7fbc8) 49.64%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #13479      +/-   ##
==========================================
+ Coverage   49.56%   49.64%   +0.07%     
==========================================
  Files         256      258       +2     
  Lines       43920    44192     +272     
==========================================
+ Hits        21770    21940     +170     
- Misses      19987    20091     +104     
+ Partials     2163     2161       -2

see 27 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

crenshaw-dev · 2023-05-09T16:40:41Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+## Proposal 
+
+The configuration for this will be present in the `argocd-cm`, we will add new boolean field `resource.respectRBAC` in the
+cm which can be set to `true` to enable this feature, by default the feature is disabled.


Wouldn't this be safe to enable by default? If the controller doesn't have access to a resource, respecting RBAC will only help it avoid 403s.

We can keep it enabled by default, but that might suddenly change the behavior of the controller when a user upgrades which I thought might not be welcomed by some users.

Makes sense! I think we'd want to everywhere target enabling by default. But agreed, best to move cautiously.

I would also be voting for kind of a cautious move here. Make it a feature toggle, test it extensively, and then move forward to enable it by default.

Yea I think by default for now at least this would be disabled and opt in for the user.

jannfis · 2023-05-10T00:15:49Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+cm which can be set to `true` to enable this feature, by default the feature is disabled.
+
+The feature will also modify `gitops-engine` pkg to add a `SelfSubjectAccessReview` request before adding any resource to the watch list, 
+which will make sure that argocd only monitors resources that it has access to.


I would suggest not to perform an additional request to SelfSubjectAccessReview, because it potentially doubles the number of requests required for building up the cache. On large clusters, this is problematic already as of today.

Instead, I'd like to propose evaluating the API response for a given resource during list and/or establishing the watch.

Agreed, updateed proposal to match the same.

@jannfis I presented the proposal in today's meeting and there was a concern with depending on the api response, it was about false positives that might be encountered due to excessive load on the kube api server and also due to env-specific proxies. So I am adding back the SelfSubjectAccessReview implementation option besides the api response approach with their advantages and disadvantages as discussed in the call today.

Also added a third approach recommended by @alexmt where we combine 1 and 2. Tagging others for opinion @crenshaw-dev @leoluz @jannfis

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

…ce access Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

…eat/respectRBAC

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

anandf · 2023-05-31T14:48:19Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+3. Combine approaches 1 and 2, in this controller will check the api response for the list call, and if it receives forbidden/unauthorized it will make the `SelfSubjectAccessReview` call.
+   This approach is accurate and at the same time, only makes extra api calls if the list calls fail in the first place.
+
+In all solutions, once controller determines that it does not have access to the resource it will stop monitoring it.


What if the user adds/removes access later ?

A controller restart will be necessary

Controller automatically does full "resync" every day. So it will auto-discover RBAC change once a day

Ah yes ^^ though if you want it instantly a restart is needed.

@alexmt The watches will be re-established every 10 minutes as well, right? I think part of that is also rediscovery changes in available APIs. I guess with that, auto-discovery of RBAC changes would also happen all 10 minutes instead of only after cache expiry?

I think controller is retrying only in-progress watches every 10 minutes. So it will notice if RBAC no longer allows accessing resource but won't notice if RBAC is allowing new resources

But if you add new CRDs to the cluster, Argo CD will pick those up without the need for a restart. Is that due to an informer on the CRD API?

As for the restart, I think a cluster-cache resync (which can be triggered via Argo CD's API) should be sufficient? Maybe we can add a new functionality that Argo CD will also re-try (probe) the APIs it received a 401 for previously? In a configurable interval, which defaults to the same duration as the watch timeout (10 minutes)?

Anyway, I think this is implementation detail and could even be added later on. Just keeping this here as a potential idea.

jannfis · 2023-06-13T19:07:05Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+1. Modify `gitops-engine` pkg to make a `SelfSubjectAccessReview` request before adding any resource to the watch list, in this approach we are making an extra
+   api server call to check if controller has access to the resource, this does increase the no. of kubeapi calls made but is more accurate.
+2. Modify `gitops-engine` pkg to check for forbidden/unauthorized errors when listing for resources, this is more efficient approach as the
+   no. of kubeapi calls made does not change, but there is a chance of false positives as similar errors can be returned from kubeapi server or env specific proxies in other situations


I wonder what kind of false positives that would be?

And, if there's a false-positive, chances are that SelfSubjectAccessReview will return positive, but the API server will reply with a 403 when trying to establish the watch, then again resulting in a hard error?

jannfis · 2023-06-13T19:12:20Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+For the implementation there are 3 proposals :
+
+1. Modify `gitops-engine` pkg to make a `SelfSubjectAccessReview` request before adding any resource to the watch list, in this approach we are making an extra
+   api server call to check if controller has access to the resource, this does increase the no. of kubeapi calls made but is more accurate.


Re: The number of kubeapi calls. In clusters with a lot of APIs (e.g. OpenShift comes with 200 APIs installed by default), that's already at least a burst of 200 API calls in a very short time frame. The default QPS for the client-side rate limiter in the kube API client is 50 with a burst of 100 (2 * QPS).

With additional SelfSubjectAccessReview, this numbers would increase to a burst of 400 calls.

Things get even worse when you consider namespace-scoped mode. Because the watches would be established for each of the APIs per managed namespaces. Considering the 200+ APIs example, this would result in at least 400 calls per managed namespace.

This seems to be numbers that can easily break the API server.

The default QPS for the client-side rate limiter in the kube API client is 50 with a burst of 100 (2 * QPS).

The default in kube has actually been increased to either 300 or 500. But Argo's default is currently set lower. We should set ours to the new default

Yeah, agreed.

50/100 is a very conservative number. I know that in the OpenShift client, this has been changed for a while (I think to 350 for the burst at least), and if upstream K8s has adopted a similar increase by default, we should do it as well. People can still fine-tune it using ARGOCD_K8S_CLIENT_QPS and ARGOCD_K8S_CLIENT_BURST when they want more conservative rate limits, but I guess the better user experience is to have them increased by default.

@jessesuen it seems that the default in the kube client is 5 and 10 respectively: https://pkg.go.dev/k8s.io/client-go@v0.27.3/rest#pkg-constants (https://github.com/kubernetes/client-go/blob/ebad5fbb0d96ae12547fd19468316a604bce5ccf/rest/config.go#L43-L46)

jannfis · 2023-06-13T19:22:31Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+3. Combine approaches 1 and 2, in this controller will check the api response for the list call, and if it receives forbidden/unauthorized it will make the `SelfSubjectAccessReview` call.
+   This approach is accurate and at the same time, only makes extra api calls if the list calls fail in the first place.


This strongly depends on the use-case of the user and what kind of permissions they are willing to give to a specific Argo CD instance. For example, if there's an instance that is only allowed to manage 10 out of 200 APIs, you would still have 190 additional calls to the SelfSubjectAccessReview.

Can we maybe make the use of SubjectSelfAccessReview configurable? Then people can have the trade-off they want - on clusters with a low amount of APIs, they can configure the mechanism to the highest precision while on clusters with high amount of APIs, they trade precision against speed.

Sure I think we can have another config option that enables the stricter check. Will update the proposal with this

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

jannfis · 2023-06-15T17:59:47Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+It was decided that we will go with approach 3 from the above list, but we shall provide 2 boolean configurations options to users :
+   - `resource.respectRBAC.strict` : This will perform both the checks i.e. whether the list call response is forbidden/unauthorized and if it is make the `SelfSubjectAccessReview` call to confirm.
+   - `resource.respectRBAC.normal` : This will only check whether the list call response is forbidden/unauthorized and skip `SelfSubjectAccessReview` call.
+
+NOTE: `strict` has higher priority so irrespective of the status of `normal` if strict is set to true strict mode will be used.


It's more of a nit, but could we maybe - instead of two boolean options - have one option that takes three values:

normal (or maybe lax, because this is more the opposite of strict),

strict

false (or disabled, or off), being also the default if the field is not set.

Or as an alternative: Have two boolean values, resource.respectRBAC.enabled and resource.respectRBAC.strict, but in order to turn on strict, enabled has to be set to true as well.

I believe that would be less confusing and more ergonomic to users configuring Argo CD. But as I said, it's a nit, and I'm interested in other people's opinion.

Oh that's a good idea I think having a single field with 3 values is better, I would probably make it just 2 and use empty "" or missing field as disabled. Wdyt?

jannfis

LGTM

jannfis · 2023-06-26T17:16:45Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+- TBD
+
+reviewers:
+- TBD


Suggested change

- TBD

- @jannfis

jannfis · 2023-06-26T17:16:57Z

docs/proposals/respect-rbac-for-resource-exclusions.md

+- TBD
+
+approvers:
+- TBD


Suggested change

- TBD

- @jannfis

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

alexmt

LGTM!

…goproj#13479) * feat: respecting rbac for resource exclusions/inclusions proposal (argoproj#13479) Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

feat: respecting rbac for resource exclusions/inclusions proposal

d0ec891

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

alexmt requested review from alexmt and jannfis May 8, 2023 16:23

gdsoumya mentioned this pull request May 8, 2023

Best effort controller watch on resources (instead of resource inclusions/exclusions) #13239

Closed

alexmt requested a review from jessesuen May 8, 2023 16:40

feat: added security considerations and upgrade downgade strategy

7f7b7ae

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

crenshaw-dev reviewed May 9, 2023

View reviewed changes

jannfis reviewed May 10, 2023

View reviewed changes

gdsoumya and others added 5 commits May 10, 2023 20:00

feat: updated imnplementation detail

6975db6

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

Merge branch 'master' into feat/respectRBAC

35c096d

feat: updated proposal to include both approaches of detecting resour…

0052a8a

…ce access Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

Merge branch 'feat/respectRBAC' of github.com:gdsoumya/argo-cd into f…

277079b

…eat/respectRBAC

feat: add third approach

199e5bd

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

anandf reviewed May 31, 2023

View reviewed changes

jannfis reviewed Jun 13, 2023

View reviewed changes

feat: allow users to opt in for strict mode

5ba4ace

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

jannfis reviewed Jun 15, 2023

View reviewed changes

Merge branch 'master' into feat/respectRBAC

08628b8

jannfis approved these changes Jun 26, 2023

View reviewed changes

feat: updated proposal

1e7fbc8

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

alexmt approved these changes Jul 3, 2023

View reviewed changes

alexmt merged commit d7632df into argoproj:master Jul 3, 2023

agaudreault mentioned this pull request Aug 7, 2023

Allow to sync resources even with ComparisonError #11356

Closed

suzaku mentioned this pull request Aug 15, 2023

use std lib suzaku/argo-cd#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: respecting rbac for resource exclusions/inclusions proposal #13479

feat: respecting rbac for resource exclusions/inclusions proposal #13479

gdsoumya commented May 6, 2023

codecov bot commented May 6, 2023 •

edited

Loading

crenshaw-dev May 9, 2023

gdsoumya May 9, 2023

crenshaw-dev May 9, 2023

jannfis Jun 15, 2023

gdsoumya Jun 15, 2023

jannfis May 10, 2023

jannfis May 10, 2023

gdsoumya May 10, 2023 •

edited

Loading

gdsoumya May 11, 2023

gdsoumya May 11, 2023

anandf May 31, 2023

gdsoumya May 31, 2023

alexmt Jun 15, 2023

gdsoumya Jun 15, 2023

jannfis Jun 15, 2023

alexmt Jun 22, 2023

jannfis Jun 26, 2023

jannfis Jun 13, 2023

jannfis Jun 13, 2023

jessesuen Jun 15, 2023

jannfis Jun 15, 2023

jannfis Jun 16, 2023

jannfis Jun 13, 2023 •

edited

Loading

jannfis Jun 13, 2023

gdsoumya Jun 15, 2023

gdsoumya Jun 15, 2023

jannfis Jun 15, 2023

gdsoumya Jun 15, 2023 •

edited

Loading

jannfis left a comment

jannfis Jun 26, 2023

jannfis Jun 26, 2023

alexmt left a comment

		3. Combine approaches 1 and 2, in this controller will check the api response for the list call, and if it receives forbidden/unauthorized it will make the `SelfSubjectAccessReview` call.
		This approach is accurate and at the same time, only makes extra api calls if the list calls fail in the first place.

feat: respecting rbac for resource exclusions/inclusions proposal #13479

feat: respecting rbac for resource exclusions/inclusions proposal #13479

Conversation

gdsoumya commented May 6, 2023

codecov bot commented May 6, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdsoumya May 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannfis Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdsoumya Jun 15, 2023 • edited Loading

Choose a reason for hiding this comment

jannfis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexmt left a comment

Choose a reason for hiding this comment

codecov bot commented May 6, 2023 •

edited

Loading

gdsoumya May 10, 2023 •

edited

Loading

jannfis Jun 13, 2023 •

edited

Loading

gdsoumya Jun 15, 2023 •

edited

Loading