Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Restrict Secret Access #3677

Merged
merged 14 commits into from
Dec 8, 2022
Merged

Support Restrict Secret Access #3677

merged 14 commits into from
Dec 8, 2022

Conversation

kevinteng525
Copy link
Contributor

@kevinteng525 kevinteng525 commented Sep 19, 2022

Support Restrict Secret Access, refer to #3668

To remove Secret from clusterrole and only grant get/list/watch role in KEDA namespace, need set environment variable RESTRICT_SECRET_ACCESS="true"

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for this contribution ❤️

In general looking good, I have left some comments inline.
Apart from that, some other things:

  1. Open please a PR to docs, explaining the new Environment Variable
  2. Open a PR to chart, providing a parameter to set this value

It's the first time I have seen SecretLister so I have some questions, what are we earning using it instead of just checking the namespace before listing them?
In order to reduce the required permissions, maybe we should pass the informer till the end and use it to check also the configMaps because ATM we are reducing the secrets scope, but we still need configMap permissions over the cluster.

adapter/main.go Outdated
@@ -132,22 +135,34 @@ func (a *Adapter) makeProvider(ctx context.Context, globalHTTPTimeout time.Durat

broadcaster := record.NewBroadcaster()
recorder := broadcaster.NewRecorder(scheme, corev1.EventSource{Component: "keda-metrics-adapter"})
handler := scaling.NewScaleHandler(mgr.GetClient(), nil, scheme, globalHTTPTimeout, recorder)

kubeClientset, _ := kubernetes.NewForConfig(ctrl.GetConfigOrDie())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create a new client here? How does this impact? Could we reuse the adapter client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually the adapter client mgr.GetClient() is managed by controller-runtime, the type is client.Client, which is different from the Clientset, and cannot be used to new informer with limited Namespace.
I have also tried to set limited Namespace for mgr, however it will impact all resources managed by that mgr, so if we want to only restrict namespace access for secret and configmap, we need to create separate informers so as not to impact other resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My afraid here is that using client.Client we are using the same cache for every resource, and we have a single client for everything. Now we have potentially 2 clients, so eventually we can request the things twice, adding more load to the api-server.
I'm not saying this is wrong, but it's a concert I have related with performance.
I guess that we could evaluate manually the namespace we are requesting directly when we request the secrets and based on that execute the request or empty response. Using this approach, we don't need the informer and could reuse the only client we have instead of having 2.
WDYT?
CC: @kedacore/keda-contributors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I did try set the namespace when requesting, client.Get(ctx, types.NamespacedName{Name: name, Namespace: "keda"}, secret)
however it will still need permission to all namespaces, since the client is cache for all resources and all namespace.
I have also tried limit the namespace when newManager, however it will limit the namespace for all resources, since they share one client.
That's why I could only add a new clientset, specifically for secret , and later could also be used for configmap

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure about using different clients for this. I'm afraid of increasing the request against api server. Using the underlying client in the controller-runtime, we have control over the request rate, but I'm not sure with this new client.
@v-shenoy @zroubalik , WDTY?

Copy link
Contributor Author

@kevinteng525 kevinteng525 Nov 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Jorge, I totally understand your concern on performance, however actually this PR is just introduce a namespaced informer for secret to only list/watch on KEDA namespace, so as no need to list/watch secrets from all namespaces, which is also a cache and will not introduce more load on APIServer, also it has been proved during our performance test.

Meanwhile I have also reviewed this feature, my understanding is this feature is trying to cache the metric values in Metric Server to reduce the load on the external service, if there's a request coming from HPA(k8s) we will return from cache if it is within the interval? If my understanding is correct, this feature will not help to allow KEDA to startup if removed secret from clusterrole as below

- apiGroups:
  - ""
  resources:
  - external
  - pods
  - services
  verbs:
  - get
  - list
  - watch

Would you pls. try to remove secrets from your clusterrole as above and only add into namespaced role, then see how KEDA behaves? I thought it might help you to better understand what problem this PR is trying to solve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you mentioned potentially it will allow multiple namespaced KEDA operators and transform the metrics server in a simple store for metric values, do you mean we need one KEDA operator for each application if need retrieve metrics in future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,
No no, you don't need one operator per application, but in the near future I hope that you can have 1 operator per namespace, which basically reduces the accesses in k8s api. With the new pulling approach we are working on, the metrics server (the piece that must be global in the cluster) will be a "stupid api" without any access to anywhere, and it will pull the values from the operator/operators.
In this scenario, is the operator who has the accesses to upstreams, so basically we could deploy operators in multiple namespaces with its own namespace as scope, limiting the accesses to its own namespace and avoiding the issue under this PR.
This PR does the behaviour change, being the (single) operator who gets the metrics and transforming the metrics server into the "studip" api. Once that PR is merged, we only need to add the support for multiple namespaced operators for having the namespace access isolation.

Copy link
Contributor Author

@kevinteng525 kevinteng525 Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jorge,

To be honest, I don't think it's a good idea to have 1 operator per namespace, it will not reduce the access to k8s apiserver, instead it will increase the load to apiserver and also add huge efforts on operation and maintenance.

In our practice, we will have one namespace per application, considering the scenarios that we have 4000 applications, which means we have 4000 namespaces, then do we need deploy 4000 KEDA operators in future?

Some problems might be introduced if we deployed 4000 operators:

  • Each KEDA operator will create one manager which will create one client, then 4000 operators will have 4000 clients...
  • Currently we only need to manage one KEDA operator and one KEDA metrics server, in future, we need manage and monitor 4000 kEDA operators in one cluster, we have 100+ clusters in our production env, which means we need actively monitor 400000 KEDA operators, that will be a nightmare to us...
  • If need create / delete any application, we need also deploy / delete KEDA operator as well, this is not how a normal operator works. Normally for one operator, there will be only one instance in one cluster.
  • It will not solve the security risk since we will need manage all KEDA operators in all namespace, which means we will have 4000 SA(service account) in 4000 namespaces(1 SA per namespace) to get the secret from that namespace. The only difference is previous the permission to access all secrets is granted to one SA(by clusterrole), in future the permission will be granted to 4000 SAs (by namespace role)

Pls. correct me if I'm wrong.

Thanks,
Kevin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation is okay.

main.go Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
pkg/scaling/resolver/scale_resolvers.go Outdated Show resolved Hide resolved
pkg/scaling/resolver/scale_resolvers.go Outdated Show resolved Hide resolved
pkg/scaling/resolver/scale_resolvers.go Outdated Show resolved Hide resolved
adapter/main.go Show resolved Hide resolved
adapter/main.go Show resolved Hide resolved
main.go Show resolved Hide resolved
@JorTurFer
Copy link
Member

JorTurFer commented Sep 27, 2022

/run-e2e
Update: You can check the progress here

Copy link
Member

@tomkerkhove tomkerkhove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add logging that we are trying to look for secrets but it has been disabled (please include setting name) which means ClusterTriggerAuthentication is the only way to get secrets

@kevinteng525
Copy link
Contributor Author

/run-e2e

@JorTurFer
Copy link
Member

JorTurFer commented Oct 17, 2022

/run-e2e
Update: You can check the progress here

@tomkerkhove
Copy link
Member

tomkerkhove commented Dec 2, 2022

/run-e2e
Update: You can check the progress here

@tomkerkhove
Copy link
Member

We are going to release KEDA v2.9 on Thursday. Do you think you can complete the open work by Tuesday @kevinteng525? That allows us to do a re-review on Wednesday

@kevinteng525 kevinteng525 force-pushed the secret branch 3 times, most recently from cc2001f to 453acff Compare December 4, 2022 12:47
@kevinteng525
Copy link
Contributor Author

We are going to release KEDA v2.9 on Thursday. Do you think you can complete the open work by Tuesday @kevinteng525? That allows us to do a re-review on Wednesday

@tomkerkhove Thanks, I have resolved all conflicts, should be ready for re-review.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looking good, I left just a few nits. Thanks

main.go Outdated Show resolved Hide resolved
pkg/scaling/resolver/scale_resolvers.go Outdated Show resolved Hide resolved
adapter/main.go Outdated
@@ -132,22 +135,34 @@ func (a *Adapter) makeProvider(ctx context.Context, globalHTTPTimeout time.Durat

broadcaster := record.NewBroadcaster()
recorder := broadcaster.NewRecorder(scheme, corev1.EventSource{Component: "keda-metrics-adapter"})
handler := scaling.NewScaleHandler(mgr.GetClient(), nil, scheme, globalHTTPTimeout, recorder)

kubeClientset, _ := kubernetes.NewForConfig(ctrl.GetConfigOrDie())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation is okay.

adapter/main.go Outdated Show resolved Hide resolved
@kevinteng525 kevinteng525 force-pushed the secret branch 2 times, most recently from 99ef697 to 9b8c8f3 Compare December 5, 2022 13:40
@kevinteng525
Copy link
Contributor Author

In general looking good, I left just a few nits. Thanks

Thanks @zroubalik for the review, I have updated accordingly.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few nits I have found, also Static Checks are failing due to missing new line.

@kevinteng525 Could you please add a note to Changelog? (Improvements section)

@@ -52,3 +56,27 @@ func ResolveOsEnvDuration(envName string) (*time.Duration, error) {

return nil, nil
}

func GetClusterObjectNamespace() (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add description here to a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added.

main.go Outdated
@@ -223,6 +242,14 @@ func main() {
setupLog.Info(fmt.Sprintf("Go OS/Arch: %s/%s", runtime.GOOS, runtime.GOARCH))
setupLog.Info(fmt.Sprintf("Running on Kubernetes %s", kubeVersion.PrettyVersion), "version", kubeVersion.Version)

ctx := context.Background()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't create a new context here, but we should use the same that's being created on line 253 : if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

setupLog.Error(err, "Unable to get cluster object namespace")
os.Exit(1)
}
kubeInformerFactory := kubeinformers.NewSharedInformerFactoryWithOptions(kubeClientset, 1*time.Hour, kubeinformers.WithNamespace(objectNamespace))
Copy link
Member

@zroubalik zroubalik Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a short comment for these lines, why kubeInformerFactory and secret informer is needed? Ideally with a link to KEDA issue about this feature? for future reference? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added.

logger.Error(err, "Unable to get cluster object namespace")
return nil, nil, err
}
kubeInformerFactory := kubeinformers.NewSharedInformerFactoryWithOptions(kubeClientset, 1*time.Hour, kubeinformers.WithNamespace(objectNamespace))
Copy link
Member

@zroubalik zroubalik Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a short comment for these lines, why kubeInformerFactory and secret informer is needed? Ideally with a link to KEDA issue about this feature? for future reference? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added.

// GetRestrictSecretAccess retrieves the value of the environment variable of KEDA_RESTRICT_SECRET_ACCESS
func GetRestrictSecretAccess() string {
return os.Getenv(RestrictSecretAccessEnvVar)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line is missing here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

@zroubalik
Copy link
Member

zroubalik commented Dec 5, 2022

/run-e2e aws*
Update: You can check the progress here

Support Restrict Secret Access, refer to kedacore#3668

Signed-off-by: kevin <tengkang@msn.com>
Fix the test

Signed-off-by: kevin <tengkang@msn.com>
Signed-off-by: kevin <tengkang@msn.com>
Signed-off-by: kevin <tengkang@msn.com>
Update isSecretAcessRestricted function, comments and Env Variable to make it clearer semantically

Signed-off-by: kevin <tengkang@msn.com>
fixing redeclaring during merge

Signed-off-by: kevin <tengkang@msn.com>
sort imports

Signed-off-by: kevin <tengkang@msn.com>
Add logging if KEDA_RESTRICT_SECRET_ACCESS=true which means ClusterTriggerAuthentication is the only way to get secrets

Signed-off-by: kevin <tengkang@msn.com>
fix UT

Signed-off-by: kevin <tengkang@msn.com>
fix UT

Signed-off-by: kevin <tengkang@msn.com>
fix static checks

Signed-off-by: kevin <tengkang@msn.com>
Enhance based on comments

Signed-off-by: kevin <tengkang@msn.com>
gofmt

Signed-off-by: kevin <tengkang@msn.com>
Add changelog & comments

Signed-off-by: kevin <tengkang@msn.com>
@zroubalik
Copy link
Member

zroubalik commented Dec 6, 2022

/run-e2e
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great job on this!
Let's merge this once PRs on docs and charts are completed

@zroubalik
Copy link
Member

LGTM

great job on this! Let's merge this once PRs on docs and charts are completed

@kevinteng525 do you have any update on this? we would like to do the release soon.

@kevinteng525
Copy link
Contributor Author

LGTM
great job on this! Let's merge this once PRs on docs and charts are completed

@kevinteng525 do you have any update on this? we would like to do the release soon.

Yes, I have updated all PRs, pls. help to review again.

@zroubalik zroubalik merged commit f21a7db into kedacore:main Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants