Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

ashishrajora0808 · 2024-06-23T11:49:07Z

Which component are you using?: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0

cluster-autoscaler

What version of the component are you using?:

Component version: v1.29.0

What k8s version are you using (kubectl version)?: 1.30

kubectl version Output

$ kubectl version

What environment is this in?: AWS

What did you expect to happen?: On trialling the Amazon linux 2023 EKS optimised AMI, I just expected things to work as the worker nodes in EKS have all the desired permissions for the AS to communicate to the ASG.

What happened instead?:
I am getting errors on the startup of ASG which points to some sort of credentials or networking issue.

How to reproduce it (as minimally and precisely as possible):

Build the EKS cluster with AL2023 EKS optimised AMI amazon/amazon-eks-node-al2023-x86_64-standard-1.27-v20240615.
Try to install ASG version v1.29.0 via helm
The logs should indicate the below

│ I0621 15:22:13.971945 1 aws_manager.go:79] AWS SDK Version: 1.48.7 │
│ I0621 15:22:13.972068 1 auto_scaling_groups.go:396] Regenerating instance to ASG map for ASG names: [] │
│ I0621 15:22:13.972083 1 auto_scaling_groups.go:403] Regenerating instance to ASG map for ASG tags: map[k8s.io/cluster-autoscaler/enabled: k8s.io/clust │
│ er-autoscaler/qa-ore-blue:] │
│ E0621 15:24:14.262752 1 aws_manager.go:128] Failed to regenerate ASG cache: RequestError: send request failed │
│ caused by: Post "https://autoscaling.us-west-2.amazonaws.com/": dial tcp: lookup autoscaling.us-west-2.amazonaws.com: i/o timeout │
│ F0621 15:24:14.262782 1 aws_cloud_provider.go:460] Failed to create AWS Manager: RequestError: send request failed │
│ caused by: Post "https://autoscaling.us-west-2.amazonaws.com/": dial tcp: lookup autoscaling.us-west-2.amazonaws.com: i/o timeout

Anything else we need to know?: I updated the AWS VPC CNI plugin as part of the investigation but it did not help

amazon-k8s-cni-init:v1.18.2

The AS service account for the new EKS AL2023 AMI looks like it is not loading secrets. Not sure if this is the cause:

Name: cluster-autoscaler-aws-cluster-autoscaler │
│ Namespace: kube-system │
│ Labels: app.kubernetes.io/instance=cluster-autoscaler │
│ app.kubernetes.io/managed-by=Helm │
│ app.kubernetes.io/name=aws-cluster-autoscaler │
│ app.kubernetes.io/version=1.29.0 │
│ helm.sh/chart=cluster-autoscaler-9.35.0 │
│ Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789987:role/*-eks-worker-role-ore │
│ meta.helm.sh/release-name: cluster-autoscaler │
│ meta.helm.sh/release-namespace: kube-system │
│ Image pull secrets: │
│ Mountable secrets: │
│ Tokens: │
│ Events:

The text was updated successfully, but these errors were encountered:

adrianmoisey · 2024-07-08T18:58:22Z

/area cluster-autoscaler

Arulaln-AR · 2024-07-16T10:43:11Z

Hi @adrianmoisey and @ashishrajora0808 ,

You have some solution for the above issue?

ashishrajora0808 · 2024-07-16T11:41:35Z

I haven't found any resolution to this. Working with AWS support but no luck yet. Are you facing the same issue @Arulaln-AR ?

Arulaln-AR · 2024-07-16T12:58:42Z

@ashishrajora0808 , yes. i am too working on the same issue with AWS support.

But i know the other working solution, which you need to follow below article. It is like attaching role to a service account annotation and referring the service account to the pod.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

But i also don't want to follow that.

Arulaln-AR · 2024-07-16T13:01:43Z

@ashishrajora0808 , If you don't want to follow the other solution provided above. Simply change the nodegroup ami type to "AL2_x86_64" from "AL2023_x86_64_STANDARD".
When i did that, it is working.

ashishrajora0808 · 2024-07-22T12:57:57Z

@Arulaln-AR Thanks, but that action will just revert back to the AL2 AMI. I want to use the new AL2023 AMI as AL2 goes end of life next year.

ashishrajora0808 · 2024-07-22T14:15:14Z

@Arulaln-AR The issue seems to be due to the nodeconfig, the block

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: my-cluster
apiServerEndpoint: https://example.com
certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
cidr: 10.100.0.0/16

needs the CIDR 10.100.* to be hardocded in my case as I was passing the VPC CIDR in it.

ashishrajora0808 · 2024-07-22T14:15:40Z

Not autoscaler related so closing the case.

Arulaln-AR · 2024-07-22T15:07:44Z

@ashishrajora0808 , I heard from aws support. it is because of the token hop limit. It is set to 1 for AL2023 but in AL2 it was set to 2.

We need to customize the launch template to make it work.

ashishrajora0808 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2024

k8s-ci-robot added the area/cluster-autoscaler label Jul 8, 2024

ashishrajora0808 closed this as completed Jul 16, 2024

ashishrajora0808 reopened this Jul 16, 2024

ashishrajora0808 closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

ashishrajora0808 commented Jun 23, 2024 •

edited by gjtempleton

Loading

adrianmoisey commented Jul 8, 2024

Arulaln-AR commented Jul 16, 2024

ashishrajora0808 commented Jul 16, 2024

Arulaln-AR commented Jul 16, 2024

Arulaln-AR commented Jul 16, 2024

ashishrajora0808 commented Jul 22, 2024

ashishrajora0808 commented Jul 22, 2024

ashishrajora0808 commented Jul 22, 2024

Arulaln-AR commented Jul 22, 2024

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Comments

ashishrajora0808 commented Jun 23, 2024 • edited by gjtempleton Loading

adrianmoisey commented Jul 8, 2024

Arulaln-AR commented Jul 16, 2024

ashishrajora0808 commented Jul 16, 2024

Arulaln-AR commented Jul 16, 2024

Arulaln-AR commented Jul 16, 2024

ashishrajora0808 commented Jul 22, 2024

ashishrajora0808 commented Jul 22, 2024

ashishrajora0808 commented Jul 22, 2024

Arulaln-AR commented Jul 22, 2024

ashishrajora0808 commented Jun 23, 2024 •

edited by gjtempleton

Loading