Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unschedulable: 0/2 nodes are available: 2 Insufficient cpu. #706

Closed
SinaChavoshi opened this issue Jan 18, 2019 · 4 comments
Closed

Unschedulable: 0/2 nodes are available: 2 Insufficient cpu. #706

SinaChavoshi opened this issue Jan 18, 2019 · 4 comments
Assignees

Comments

@SinaChavoshi
Copy link
Contributor

The error message for resource shortage is incorrect. In this instance i was requesting 20 cpus where my cluster has a max limit of 10.

https://screenshot.googleplex.com/yvuBNnYqn0p

@hongye-sun
Copy link
Contributor

This is a message from k8s. It says that you have two nodes in your cluster but none of them meet the requirements of requested CPUs.

@paveldournov
Copy link
Contributor

@SinaChavoshi - please scale out the cluster by adding more nodes to the pool, or setup auto-scaling for the node pool.

@jlewi - should we consider setting up auto-scaling automatically in the 1-click deploy app?

@jlewi
Copy link
Contributor

jlewi commented Jan 22, 2019

Autoscaling should already be enabled:
https://github.com/kubeflow/kubeflow/blob/797bcb7407a589bacc35b9624120f51f36a83468/deployment/gke/deployment_manager_configs/cluster-kubeflow.yaml#L46

My hunch is @SinaChavoshi is requesting a single pod with 20 CPUs. The current default VM type is 8 CPU
https://github.com/kubeflow/kubeflow/blob/797bcb7407a589bacc35b9624120f51f36a83468/deployment/gke/deployment_manager_configs/cluster.jinja#L118

This was by design to try to fit into the free tier.

There are two options here

  1. Add a node pool with larger VM types

  2. Modify the node auto-provisioning settings to provision larger nodes.

It looks like auto-provisioning should be enabled:
https://github.com/kubeflow/kubeflow/blob/797bcb7407a589bacc35b9624120f51f36a83468/deployment/gke/deployment_manager_configs/cluster.jinja#L96

And should be able to provision larger nodes.

@SinaChavoshi Which version of Kubeflow are you using? I think we only turned on auto-provisioning in 0.4.

If you look at the K8s events for the resource that isn't getting scheduled (e.g. the pod) it should provide more information about why autoscaling didn't kick in.

@vicaire
Copy link
Contributor

vicaire commented Mar 26, 2019

Resolving since there does not seem to be any next action to take. Please re-open if something else is needed.

@vicaire vicaire closed this as completed Mar 26, 2019
Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
…r. (kubeflow#703) (kubeflow#706)

* Upgrade Tekton to 0.27 for pipelineloop controller. (kubeflow#703)

* Upgrade Tekton to 0.27 for pipelineloop controller. (kubeflow#703)

* Update pipelinelooprun.go
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants