Max Node Provision Time + Priority Expander + Node Unavailability #3490

nitrag · 2020-09-04T13:52:40Z

I have an issue where we did not scale up because spot instances were not available.

AWS EKS: 1.15
Cluster-Autoscaler: 1.15.7
Expander: priority

I have configured --max-node-provision-time="5m0s". The expected behavior would be that after 5 minutes, the node group is marked as unavailable and the next highest priority group is chosen. The logs below shot it's being ignored. IThe default value for --max-node-provision-time is 15 minutes but I see cluster-autoscaler recognizing the node group unhealthy at 10 minutes. Also, why was the spot not group (marked?) unhealthy yet priority expander still chooses it later on? What is the purpose of the i-placeholder instances?

Here are the logs:

--max-node-provision-time="5m0s"  <<confirmed in first 50 lines of pod startup>>

...

2020-09-02T11:22:40.246-04:00
{"log":"I0902 15:22:40.246183 1 scale_up.go:699] Scale-up: setting group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot size to 1

<< nothing 5 minutes later 11:27:xx >>

2020-09-02T11:32:08.458-04:00
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
...<< continuous logs starting 11:32:08 through 11:33:20 >>

2020-09-02T11:33:10.351-04:00
{"log":"W0902 15:33:10.351097 1 clusterstate.go:612] Readiness for node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot not found

<< 10 minutes later?? >>

2020-09-02T11:33:10.353-04:00
{"log":"W0902 15:33:10.353075 1 scale_up.go:326] Node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot is not ready for scaleup - unhealthy


2020-09-02T11:33:10.833-04:00
{"log":"I0902 15:33:10.831490 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0


2020-09-02T11:35:25.090-04:00
{"log":"I0902 15:35:25.086715 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:35:56.010-04:00
{"log":"I0902 15:35:55.995416 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:35:56.505-04:00
{"log":"I0902 15:35:56.505028 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:36:57.919-04:00
{"log":"I0902 15:36:57.917399 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:38:00.028-04:00
{"log":"I0902 15:38:00.025189 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:38:20.823-04:00
{"log":"I0902 15:38:20.823195 1 utils.go:494] Removing unregistered node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0

2020-09-02T11:38:20.823-04:00
{"log":"I0902 15:38:20.823334 1 auto_scaling_groups.go:267] instance i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0 is detected as a placeholder, decreasing ASG requested size instead of deleting instance

2020-09-02T11:38:21.011-04:00
{"log":"I0902 15:38:20.823348 1 auto_scaling_groups.go:221] Setting asg pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot size to 0

2020-09-02T11:38:26.107-04:00
{"log":"I0902 15:38:21.289446 1 event.go:258] Event(v1.ObjectReference{Kind:\"ConfigMap\", Namespace:\"kube-system\", Name:\"cluster-autoscaler-status\", UID:\"b055dfc9-ed07-44eb-b3de-df4798d0716b\", APIVersion:\"v1\", ResourceVersion:\"121379234\", FieldPath:\"\"}): type: 'Normal' reason: 'DeleteUnregistered' Removed unregistered node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0\n

2020-09-02T11:38:31.559-04:00
{"log":"W0902 15:38:31.558973 1 clusterstate.go:577] Nodegroup is nil for aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0

2020-09-02T11:38:31.559-04:00
{"log":"I0902 15:38:31.559051 1 utils.go:494] Removing unregistered node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0

2020-09-02T11:38:31.559-04:00
{"log":"W0902 15:38:31.559062 1 utils.go:501] No node group for node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0


2020-09-02T11:38:31.562-04:00
{"log":"W0902 15:38:31.561377 1 priority.go:91] Priority expander: node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge found with score 10


2020-09-02T11:38:31.562-04:00
{"log":"W0902 15:38:31.561392 1 priority.go:91] Priority expander: node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot found with score 40


2020-09-02T11:38:31.562-04:00
{"log":"I0902 15:38:31.561415 1 scale_up.go:427] Best option to resize: pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot


2020-09-02T11:38:31.562-04:00
{"log":"I0902 15:38:31.561427 1 scale_up.go:431] Estimated 1 nodes needed in pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot

2020-09-02T11:38:31.562-04:00
{"log":"I0902 15:38:31.561446 1 scale_up.go:533] Final scale-up plan: [{pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot 0->1

<< at this point ec2 spot is successful in finding an instance (we got lucky) but our use case we can't take longer than 5 minutes to scale up a node and must fallback to non-spot (the lower priority node group) >>

The text was updated successfully, but these errors were encountered:

daganida88 · 2020-09-24T11:54:47Z

I encountered the same issue.
Did you resolve this??

nitrag · 2020-09-24T11:56:22Z

@daganida88 No unfortunately. But thanks for confirming it’s not just me. Maybe now this will get some attention.

kavicse87 · 2020-10-24T12:58:31Z

Any status on this ? I'm facing the same issue.

4pits · 2020-10-28T05:56:29Z

Well for me it worked.

See Sample logs

I1028 05:44:57.187621 1 static_autoscaler.go:303] Filtering out schedulables I1028 05:44:57.187676 1 static_autoscaler.go:320] No schedulable pods I1028 05:44:57.187691 1 scale_up.go:263] Pod default/monte-carlo-pi-service-75b6457d79-gx6wf is unschedulable I1028 05:44:57.187727 1 scale_up.go:300] Upcoming 0 nodes

I1028 05:45:07.367756 1 scale_up.go:263] Pod default/monte-carlo-pi-service-75b6457d79-gx6wf is unschedulable I1028 05:45:07.367869 1 scale_up.go:300] Upcoming 1 nodes

And then after 5 mins

W1028 05:50:19.239644 1 scale_up.go:325] Node group eksctl-eksworkshop-eksctl-nodegroup-Special-NodeGroup-M6YPOMZGXQAF is not ready for scaleup - backoff

And it then starts a node from next one from Priority Expander

I1028 05:52:20.714525 1 scale_up.go:300] Upcoming 1 nodes I1028 05:52:20.714546 1 scale_up.go:338] Skipping node group eks-72bab455-6bfe-d81f-55d8-8596f99d207d - max size reached I1028 05:52:20.714616 1 scale_up.go:408] No need for any nodes in eksctl-eksworkshop-eksctl-nodegroup-Large-NodeGroup-1RNXIXEQXPVYJ I1028 05:52:20.714665 1 scale_up.go:408] No need for any nodes in eksctl-eksworkshop-eksctl-nodegroup-Medium-NodeGroup-1MKUB7WL0ZSTV I1028 05:52:20.714718 1 scale_up.go:408] No need for any nodes in eksctl-eksworkshop-eksctl-nodegroup-Small-NodeGroup-1P7ZJ9QGOF34V W1028 05:52:20.714727 1 scale_up.go:325] Node group eksctl-eksworkshop-eksctl-nodegroup-Special-NodeGroup-M6YPOMZGXQAF is not ready for scaleup - backoff

I1028 05:53:11.047588 1 utils.go:147] Pod monte-carlo-pi-service-75b6457d79-gx6wf marked as unschedulable can be scheduled on ip-192-168-74-245.us-west-2.compute.internal. Ignoring in scale up. I1028 05:53:11.047608 1 static_autoscaler.go:317] Schedulable pods present I1028 05:53:11.047623 1 static_autoscaler.go:328] No unschedulable pods I1028 05:53:11.047635 1 static_autoscaler.go:375] Calculating unneeded node

Now my Pod is scheduled.

nitrag · 2020-10-28T13:21:17Z

@4pits what version of EKS and cluster-autoscaler are you using?

How did you get scaleup backoff to be 5 minutes? I believe it is 15 minutes by default. Did you change max-node-provision-time?

MaciekPytel · 2020-10-29T14:28:58Z

Scale-up backoff always starts at 5 minutes and grows exponentially up to 30 minutes [1].

Regarding i-placeholder instances - CA operates on individual nodes, not number of nodes. When it scales-up it tracks the status and timeouts individually for each node and placeholder instances are needed to allow that. You can find more details in discussions on #2008.

[1] https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/clusterstate/clusterstate.go#L54

nitrag · 2020-10-29T15:28:39Z

@MaciekPytel

I don't see any configuration variables that would allow us to customize these variables. Here

How would I accomplish this: I have two priority groups. The first priority fails to scale up within 5 minutes, give up on it, mark it as failed, replace it with the next highest priority, don't attempt again for 2 hours. Will I have to edit the source and build manually?

MaciekPytel · 2020-10-29T15:32:09Z

I think so, as you say those values are consts and can't currently be changed via flags.

4pits · 2020-10-30T19:26:41Z

Yes @nitrag
I used Kubernetes version 1.17 and eks version 0.30
--max-node-provision-time=5m0s

fejta-bot · 2021-01-28T19:56:14Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-02-27T20:41:58Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-03-29T21:28:06Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-29T21:28:16Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tarun-asthana · 2022-05-19T12:17:43Z

@nitrag I faced the same issue. Were you also facing a spot interruption in your fleet?

nitrag · 2022-05-19T12:19:15Z

We haven't seen this since 1.19+ and even before then it was rare.

tarun-asthana · 2022-05-19T12:40:33Z

@nitrag I am using v1.18.2 on Kubernetes 1.18. It was spot interruption which I confirmed from AWS Support. Shouldn't cluster autoscalar not use an ASG which it has marked unhealthy ? In my case it was still using it.

tarun-asthana · 2022-05-19T13:11:33Z

Created a separate bug for the same - #4900

charlesritchea · 2024-05-09T14:11:27Z

Heyyy just happened again on the same cluster as OP on 1.28 🎉

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2021

k8s-ci-robot closed this as completed Mar 29, 2021

himanshu-kun mentioned this issue Apr 6, 2023

Early abort/backoff support for Gardener nodegroups a.k.a machinedeployments gardener/autoscaler#154

Closed

charlesritchea mentioned this issue May 9, 2024

cluster-autoscaler fails to backoff when max-node-provision-time is 5 minutes leaving orphaned EC2 instance requiring manual termination #6810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max Node Provision Time + Priority Expander + Node Unavailability #3490

Max Node Provision Time + Priority Expander + Node Unavailability #3490

nitrag commented Sep 4, 2020

daganida88 commented Sep 24, 2020

nitrag commented Sep 24, 2020

kavicse87 commented Oct 24, 2020

4pits commented Oct 28, 2020 •

edited

Loading

nitrag commented Oct 28, 2020

MaciekPytel commented Oct 29, 2020

nitrag commented Oct 29, 2020

MaciekPytel commented Oct 29, 2020

4pits commented Oct 30, 2020

fejta-bot commented Jan 28, 2021

fejta-bot commented Feb 27, 2021

fejta-bot commented Mar 29, 2021

k8s-ci-robot commented Mar 29, 2021

tarun-asthana commented May 19, 2022

nitrag commented May 19, 2022

tarun-asthana commented May 19, 2022

tarun-asthana commented May 19, 2022

charlesritchea commented May 9, 2024

Max Node Provision Time + Priority Expander + Node Unavailability #3490

Max Node Provision Time + Priority Expander + Node Unavailability #3490

Comments

nitrag commented Sep 4, 2020

daganida88 commented Sep 24, 2020

nitrag commented Sep 24, 2020

kavicse87 commented Oct 24, 2020

4pits commented Oct 28, 2020 • edited Loading

nitrag commented Oct 28, 2020

MaciekPytel commented Oct 29, 2020

nitrag commented Oct 29, 2020

MaciekPytel commented Oct 29, 2020

4pits commented Oct 30, 2020

fejta-bot commented Jan 28, 2021

fejta-bot commented Feb 27, 2021

fejta-bot commented Mar 29, 2021

k8s-ci-robot commented Mar 29, 2021

tarun-asthana commented May 19, 2022

nitrag commented May 19, 2022

tarun-asthana commented May 19, 2022

tarun-asthana commented May 19, 2022

charlesritchea commented May 9, 2024

4pits commented Oct 28, 2020 •

edited

Loading