Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to scale AWS autoscaling array up to satisfy pod EBS requirement in particular zone (NoVolumeZoneConflict) #1431

Closed
garo opened this issue Nov 22, 2018 · 9 comments

Comments

@garo
Copy link

garo commented Nov 22, 2018

Running Kubernetes v1.10.3-eks in Amazon EKS. Cluster has one AWS Autoscaling Array with three different availability zone / subnets defined. At the time of the problem the cluster has two nodes, one in us-east-1a and one in us-east-1c.

There is a pod with an PVC attached, which is backed by an EBS PV in us-east-1d. Because there isn't any node running in us-east-1d the pod cannot start.

The problem is that cluster-autoscaler isn't able to scale up the autoscaling array so that a new worker would appear in us-east-1d to satisfy the zone requirement. Manually increasing the autoscaling size does give a new node in the correct region.

Cluster-autoscaler is installed with helm: chart version cluster-autoscaler-0.7.0, App version: 1.2.2.
Installation command:

helm install --name cluster-autoscaler --namespace kube-system \
   --version 0.7.0 \ # 0.7.0 is meant for kubernetes 1.10.x, see https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
   --set autoDiscovery.clusterName=$CLUSTER_NAME \
   --set awsRegion=$AWS_REGION \
   --set sslCertPath=/etc/kubernetes/pki/ca.crt \
   --set rbac.create=true \
   --set podAnnotations."iam\.amazonaws\.com/role"=$CLUSTER_NAME-eks-worker-node \
   stable/cluster-autoscaler

Cluster autoscaler error log:

static_autoscaler.go:114] Starting main loop
utils.go:456] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
static_autoscaler.go:263] Filtering out schedulables
static_autoscaler.go:273] No schedulable pods
scale_up.go:59] Pod monitoring/prometheus-mon-prometheus-operator-prometheus-0 is unschedulable
scale_up.go:92] Upcoming 0 nodes
scale_up.go:152] Scale-up predicate failed: NoVolumeZoneConflict predicate mismatch, cannot put monitoring/prometheus-mon-prometheus-operator-prometheus-0 on template-node-for-cluster-generic-nodes-4423088653825289861, reason: node(s) had no available volume zone
scale_up.go:181] No pod can fit to cluster-generic-nodes
scale_up.go:186] No expansion options
static_autoscaler.go:322] Calculating unneeded nodes
factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"monitoring", Name:"prometheus-mon-prometheus-operator-prometheus-0", UID:"9473cd20-ee27-11e8-83f9-0e13418086b6", APIVersion:"v1", ResourceVersion:"5808447", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)
scale_down.go:175] Scale-down calculation: ignoring 2 nodes, that were unremovable in the last 5m0s
static_autoscaler.go:352] Scale down status: unneededOnly=true lastScaleUpTime=2018-11-22 07:26:02.736718705 +0000 UTC lastScaleDownDeleteTime=2018-11-22 07:26:02.736719107 +0000 UTC lastScaleDownFailTime=2018-11-22 07:26:02.736719507 +0000 UTC schedulablePodsPresent=false isDeleteInProgress=false
@aleksandra-malinowska
Copy link
Contributor

aleksandra-malinowska commented Nov 22, 2018

scale_up.go:152] Scale-up predicate failed: NoVolumeZoneConflict predicate mismatch, cannot put monitoring/prometheus-mon-prometheus-operator-prometheus-0 on template-node-for-cluster-generic-nodes-4423088653825289861, reason: node(s) had no available volume zone

It's a bit of a guessing game since I've no idea what you mean by 'autoscaling array', but this sounds as if you have only 1 regional node group, with nodes in 3 different zone.

This isn't supported as Cluster Autoscaler assumes all nodes in a node group will be identical with respect to all scheduling properties (resources, labels, taints, zone). If you want to use scheduling features related to zone (like volume topological scheduling), go for 3 zonal node groups. More details are in FAQ

@garo
Copy link
Author

garo commented Nov 22, 2018

Thank you for the response.

Yes, I have a single autoscaling array which creates nodes to all three different availability zones.

Apparently Volume Topological Scheduling is introduced in 1.11 and as I'm running on EKS I'm stuck on 1.10. Am I right that on 1.10 there isn't any way to make this work?

@aleksandra-malinowska
Copy link
Contributor

Even with this feature, you'll still need to have separate node groups in each zone.

Cluster Autoscaler must be able to predict accurately what kind of node it will create. In your case, you want a node in the same zone as PV. If you use a regional node group which will create the node in a random zone, you'll get random behavior.

@garo
Copy link
Author

garo commented Nov 23, 2018

Thank you for all the feedback.

@garo garo closed this as completed Nov 23, 2018
@frederiksf
Copy link

frederiksf commented Jan 24, 2019

Even with this feature, you'll still need to have separate node groups in each zone.

Cluster Autoscaler must be able to predict accurately what kind of node it will create. In your case, you want a node in the same zone as PV. If you use a regional node group which will create the node in a random zone, you'll get random behavior.

@aleksandra-malinowska
Question: How will CA find out which node group belongs to a specific asg zone? Will it read the AWS ASG specific zones ( aws api "DescribeAutoScalingGroups")?

@jfoy
Copy link

jfoy commented Feb 12, 2019

@frederiksf Not sure about the scale-from-zero case, but per kubernetes-retired/contrib#1552 (comment) , CA asks the cloudprovider for a sample Node from the NodePool, and asserts that any capacity it adds in that NodePool will have exactly the same characteristics as the sample Node.

@abdennour
Copy link

I have EKS 1.17 and cluster-autoscaler chart v7.0.0 ... Issue still persists.
Any fix deadline ?

@dprateek1991
Copy link

I have a setup with Multiple ASGs with ap-southeast-1a and ap-southeast-1b AZ attached to all the ASGs.

In my case, we have an EC2 running in ap-southeast-1a zone and thus Persistant Volume gets attached to the node perfectly fine since the EBS volume is in ap-southeast-1a itself.

However, I have another EBS Volume in ap-southeast-1b zone and in this case Cluster Autoscaler is not scaling to add a node in ap-southeast-1b zone. What can be wrong here ? Ideally it should scale up , add a Node in 1B zone and attach the volume to it.

I get this error --> pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 Insufficient nvidia.com/gpu, 1 node(s) had volume node affinity conflict

EKS Cluster Version - 1.17

@edijsdrezovs
Copy link

@dprateek1991

If you’re using Persistent Volumes, your deployment needs to run in the same AZ as where the EBS volume is, otherwise the pod scheduling could fail if it is scheduled in a different AZ and cannot find the EBS volume. To overcome this, either use a single AZ ASG for this use case, or an ASG-per-AZ while enabling --balance-similar-node-groups.

On creation time, the ASG will have the AZRebalance process enabled, which means it will actively work to balance the number of instances between AZs, and possibly terminate instances. If your applications could be impacted from sudden termination, you can either suspend the AZRebalance feature, or use a tool for automatic draining upon ASG scale-in such as the k8s-node-drainer. The AWS Node Termination Handler will also support this use-case in the future.

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
* Add RBAC for the flowcontrol.apiserver.k8s.io

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

* Update Helm manifests

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants