Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster autoscaler doesn't apply eks-managed-ng's taint #5902

Closed
0xF0D0 opened this issue Jun 28, 2023 · 1 comment
Closed

cluster autoscaler doesn't apply eks-managed-ng's taint #5902

0xF0D0 opened this issue Jun 28, 2023 · 1 comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@0xF0D0
Copy link

0xF0D0 commented Jun 28, 2023

Which component are you using?: registry.k8s.io/autoscaling/cluster-autoscaler

What version of the component are you using?: v1.26.3

Component version:

What k8s version are you using (kubectl version)?:

Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-02-22T13:32:21Z", GoVersion:"go1.20.1", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.5-eks-c12679a", GitCommit:"c03cecf98904742cce2e1183f87194102cc9dad9", GitTreeState:"clean", BuildDate:"2023-05-22T20:29:55Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}
kubectl version Output
$ kubectl version

What environment is this in?:
Under eks k8s clutser, I have 5 eks managed node-groups and each of them has taints.
When I deploy pod without toleration, CA should not scale-up any node-group.

However it scales up node-group even it has taints and when the node joins, since there is no node to schedule the pod, it scales up that nodegroup again until it reaches the max capacity 🤯

What did you expect to happen?:
It should not scale up node group

What happened instead?:

How to reproduce it (as minimally and precisely as possible):

use eks-managed nodegroup with taint, but no taint-label(k8s.io/cluster-autoscaler/node-template/taint) on asg. Then try to schedule a pod without toleration

Anything else we need to know?:

This is my current argument setup (cluster-name mangled)

- ./cluster-autoscaler
- --v=5
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/xxxx
- --scale-down-unneeded-time=2m
- --scale-down-delay-after-add=2m

When I see the logs, ca gets that node-groups are eks-managed and get's taint correctly as well, but misbehaves when scheduling.

I0628 11:49:32.426164       1 managed_nodegroup_cache.go:124] Current ManagedNodegroup cache: [{name:m5a_large_k6 clusterName:xxxx taints:[{Key:Service Value:K6 Effect:NO_SCHEDULE TimeAdded:<nil>}] labels:map[amiType:CUSTOM capacityType:ON_DEMAND eks.amazonaws.com/nodegroup:m5a_large_k6 k8sVersion:1.26]}, ...., ]

One interesting things is when I add taint-label(k8s.io/cluster-autoscaler/node-template/taint) on asg itself, it behaves as it supposed to be 🤔

@abstrask
Copy link

Not sure why this is closed, but we've encountered the same issue, and believe we have identified the root cause (see #6481).

My colleague, @wcarlsen, and I believe we have a fix for this. Follow PR #6482 if you're interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants