clusterapi: refresh kubeconfig bearer tokens for management and workload kubeconfigs dynamically #5951

cnmcavoy · 2023-07-13T21:19:55Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Continuously update the bearer token provided by a kube config file for the clusterapi provider.

Which issue(s) this PR fixes:

Fixes #4784

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

cnmcavoy · 2023-07-13T22:21:59Z

/retest

k8s-ci-robot · 2023-07-13T22:22:13Z

@cnmcavoy: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-08-01T17:58:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cnmcavoy
Once this PR has been reviewed and has the lgtm label, please assign bigdarkclown for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

elmiko

thanks for the patch @cnmcavoy , i am a little out of my depth with some of this token wrapping stuff. i wonder if we might bug @stlaz or @liggitt to give us a comment on this?

elmiko · 2023-08-30T14:13:14Z

i'm still keen to see this get included, @cnmcavoy have you done some local testing of the refresh?

cnmcavoy · 2023-09-06T15:22:04Z

i'm still keen to see this get included, @cnmcavoy have you done some local testing of the refresh?

Indeed has been using a fork of the CA with this patch included for a month. Initially this PR only covered the kube config in the clusterapi provider package, but that led us to discover there is a third kubeconfig in use for leader election. I updated the PR earlier to address that issue and we haven't discovered anything new since.

elmiko · 2023-09-06T15:55:58Z

thanks for the update @cnmcavoy , i'm inclined to merge this based on your comments but i will bring up the PR at the CAPI meeting today just to see if we can get an extra review or two. i will plan to merge this on friday if there is no further activity.

jackfrancis · 2023-09-06T17:10:21Z

/assign

jackfrancis · 2023-09-13T16:54:27Z

@cnmcavoy code seems sane to me, is there an appropriate slot in the Cluster API provider documentation that we can use to describe this new feature? I'd be happy to build a cluster-autoscaler image from this branch and run through that test flow and report back (which would have the benefit of testing both the functionality and the documentation 😀).

…oad kubeconfigs dynamically

cnmcavoy · 2023-09-21T21:51:03Z

@jackfrancis sure! I added a note after the various configuration modes between management and workload clusters.

enj · 2023-11-01T13:07:25Z

Why is this not using something like HTTPWrappersForConfig which invokes NewBearerAuthWithRefreshRoundTripper to handle this flow? You should not be writing any custom code to handle automatic token refresh when client-go already does it for you.

enj · 2023-11-01T13:12:24Z

cluster-autoscaler/config/kubernetes.go

+		kubeConfig, err := kube_client_cmd.BuildConfigFromFlags("", rt.kubeconfigPath)
+		if err != nil {
+			return nil, fmt.Errorf("cannot build kube cluster config: %w", err)
+		}
+		rt.bearerToken = kubeConfig.BearerToken


Loading the entire kubeconfig but only honoring the bearer token changing doesn't really make sense. The bearer token should be in a distinct file that is reloaded, the rest of the contents should remain static.

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/clusterapi#autoscaler-running-anywhere-with-separate-kubeconfigs-for-management-and-workload-clusters

Cluster Autoscaler can be configured in a mode where it is running in Cluster "1", but the cluster and pods it needs to scale for and manage are in cluster "3", and the cloud configuration and node groups to scale are in cluster "2". In this scenario, there is no service account bearer token, except for running in Cluster "1" due to these being separate clusters.

The contents of the file we are loading is created by the CAPA awscluster controller, only has a bearer token, and is shaped like this:

apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0t... server: https://XXXXX.us-east-2.eks.amazonaws.com name: cluster-name contexts: - context: cluster: cluster-3 user: cluster-3-capi-admin name: cluster-3-capi-admin@cluster-3 current-context: cluster-3-capi-admin@cluster-3 kind: Config preferences: {} users: - name: cluster-3-capi-admin user: token: k8s-aws-v1...

Are you suggesting that CAPA should encode a second secret value with just the token value contents above, and have the Cluster Autoscaler reload that contents, to simulate the behavior of a file mounted service account bearer token? Or something else?

Right if you change the YAML to:

apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0t... server: https://XXXXX.us-east-2.eks.amazonaws.com name: cluster-name contexts: - context: cluster: cluster-3 user: cluster-3-capi-admin name: cluster-3-capi-admin@cluster-3 current-context: cluster-3-capi-admin@cluster-3 kind: Config preferences: {} users: - name: cluster-3-capi-admin user: tokenFile: location_of_file

And have the contents of location_of_file be:

k8s-aws-v1...

Then client-go will handle the contents of location_of_file changing for you.

enj · 2023-11-01T13:13:03Z

cluster-autoscaler/config/kubernetes.go

+	return rt.rt.RoundTrip(req)
+}
+
+var _ http.RoundTripper = &bearerAuthRoundTripper{}


You need to implement utilnet.RoundTripperWrapper

k8s-ci-robot · 2023-11-30T05:39:04Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

RichardoC · 2024-01-03T17:12:35Z

Is this still needed given the Kubernetes client-go library has been updated to a version that should fix this?

elmiko · 2024-01-03T17:28:35Z

@RichardoC that link to the aws docs is only about eks clusters though, is there a link to an issue in client-go that might talk about this?

the capi provider doesn't necessarily use eks, so we would still want a refresh here (as i understand it).

RichardoC · 2024-01-03T17:51:34Z

This functionality is upstream k8s behaviour, just couldn't find the link earlier https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#auto-generated-legacy-serviceaccount-token-clean-up
My best guess is client-go was fixed with kubernetes/client-go@66e83da#diff-a6baf0c4cd67228defd6ac03faa4c9922f5cf4dd2b27672509043b33e3d881f7R289 specifically the NewBearerAuthWithRefreshRoundTripper codepath then this shouldn't be needed?

cnmcavoy · 2024-01-03T17:59:37Z

@RichardoC I think there is some confusion here, this change was about kubeconfig's with an embedded service account token expiring. This differs from the built-in client-go handling of service account tokens, where you are correct, client-go does automatically refresh. So this is still an active issue for CAPI users with EKS clusters.

However, I went to a SIG-APIMachinery meeting last year to discuss better approaches to handling this and they suggested changes downstream, in how the CAPI project manage their kubeconfigs, to avoid this type of issue entirely. So I'm pursuing that over in kubernetes-sigs/cluster-api-provider-aws#4648

If all goes well, there will likely need to be some small autoscaler changes from that, but only for it's helm chart / docs. I'll open a new PR for that work.

elmiko · 2024-01-03T20:31:08Z

@cnmcavoy thanks for the update and continuing to follow this issue. happy to hear about the status, and thanks for closing this.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 13, 2023

k8s-ci-robot requested a review from detiber July 13, 2023 21:20

k8s-ci-robot added the area/cluster-autoscaler label Jul 13, 2023

k8s-ci-robot requested a review from jackfrancis July 13, 2023 21:20

cnmcavoy mentioned this pull request Jul 13, 2023

bug: auto-refresh tokens in a kube config file #5699

Closed

cnmcavoy force-pushed the cmcavoy/clusterapi-refresh-kubeconfig-bearertokens branch from 911efe7 to 17e36cf Compare July 17, 2023 15:57

cnmcavoy force-pushed the cmcavoy/clusterapi-refresh-kubeconfig-bearertokens branch from 17e36cf to 2e67057 Compare August 1, 2023 17:58

elmiko reviewed Aug 2, 2023

View reviewed changes

k8s-ci-robot assigned jackfrancis Sep 6, 2023

cnmcavoy mentioned this pull request Sep 21, 2023

REQUEST: New membership for @cnmcavoy kubernetes/org#4474

Closed

9 tasks

clusterapi: refresh kubeconfig bearer tokens for management and workl…

148c69f

…oad kubeconfigs dynamically

cnmcavoy force-pushed the cmcavoy/clusterapi-refresh-kubeconfig-bearertokens branch from 2e67057 to 148c69f Compare September 21, 2023 21:49

cnmcavoy requested a review from elmiko October 16, 2023 21:16

enj reviewed Nov 1, 2023

View reviewed changes

cnmcavoy mentioned this pull request Nov 1, 2023

Refactor the AWS & EKS control-plane controllers to split the kubeconfig secret into two for Cluster Autoscaler kubernetes-sigs/cluster-api-provider-aws#4607

Closed

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2023

cnmcavoy closed this Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clusterapi: refresh kubeconfig bearer tokens for management and workload kubeconfigs dynamically #5951

clusterapi: refresh kubeconfig bearer tokens for management and workload kubeconfigs dynamically #5951

cnmcavoy commented Jul 13, 2023

cnmcavoy commented Jul 13, 2023

k8s-ci-robot commented Jul 13, 2023

k8s-ci-robot commented Aug 1, 2023

elmiko left a comment

elmiko commented Aug 30, 2023

cnmcavoy commented Sep 6, 2023 •

edited

Loading

elmiko commented Sep 6, 2023

jackfrancis commented Sep 6, 2023

jackfrancis commented Sep 13, 2023

cnmcavoy commented Sep 21, 2023

enj commented Nov 1, 2023

enj Nov 1, 2023

cnmcavoy Nov 1, 2023 •

edited

Loading

enj Nov 1, 2023

enj Nov 1, 2023

k8s-ci-robot commented Nov 30, 2023

RichardoC commented Jan 3, 2024

elmiko commented Jan 3, 2024

RichardoC commented Jan 3, 2024

cnmcavoy commented Jan 3, 2024

elmiko commented Jan 3, 2024

clusterapi: refresh kubeconfig bearer tokens for management and workload kubeconfigs dynamically #5951

clusterapi: refresh kubeconfig bearer tokens for management and workload kubeconfigs dynamically #5951

Conversation

cnmcavoy commented Jul 13, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

cnmcavoy commented Jul 13, 2023

k8s-ci-robot commented Jul 13, 2023

k8s-ci-robot commented Aug 1, 2023

elmiko left a comment

Choose a reason for hiding this comment

elmiko commented Aug 30, 2023

cnmcavoy commented Sep 6, 2023 • edited Loading

elmiko commented Sep 6, 2023

jackfrancis commented Sep 6, 2023

jackfrancis commented Sep 13, 2023

cnmcavoy commented Sep 21, 2023

enj commented Nov 1, 2023

enj Nov 1, 2023

Choose a reason for hiding this comment

cnmcavoy Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

enj Nov 1, 2023

Choose a reason for hiding this comment

enj Nov 1, 2023

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 30, 2023

RichardoC commented Jan 3, 2024

elmiko commented Jan 3, 2024

RichardoC commented Jan 3, 2024

cnmcavoy commented Jan 3, 2024

elmiko commented Jan 3, 2024

cnmcavoy commented Sep 6, 2023 •

edited

Loading

cnmcavoy Nov 1, 2023 •

edited

Loading