Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Kubernetes - Run the scheduler with correct KUBE_MAX_PD_VOLS Env Variable #186

Closed
khenidak opened this issue Jan 16, 2017 · 4 comments
Closed

Comments

@khenidak
Copy link
Contributor

The scheduler defaults/falls back to 16 as maximum allowed PD per agent.
Check: https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go#L39
and
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go#L208

This means clusters running with bigger VMs (e.g. Standard_DS14_v2 accepts 32 data disks) will fail to schedule more than 16 pods that have PDs.

Note:
The scheduler currently makes this filtering decision across all agents, and is not designed for non-uniform/mixed node/agent types. We can either go - during cluster provisioning - with MIN(allowed data disk per Agent) resulting in capacity loss or MAX(allowed data disk per agent) resulting in random errors. We will have to accept one solution until k8s takes into consideration node type (from Cloud Provider) during scheduling decisions.

@anhowe
Copy link
Contributor

anhowe commented Jan 24, 2017

Thanks @khenidak for this bug report. I have set this as P1, and we will watch to see if other customers hit this.

@anhowe
Copy link
Contributor

anhowe commented May 1, 2017

@khenidak is there a flag to set this. Each VM size can support a different number of disks?

@khenidak
Copy link
Contributor Author

khenidak commented May 3, 2017

No - I have it on my list to modify the scheduler to accommodate placement according to VM size and current-count-of-attached disks.

Also my understanding is open-shift reattaches 48 disks upon startup (i could be wrong). @jimzim can add more info.

@stale
Copy link

stale bot commented Mar 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.

@stale stale bot added the stale label Mar 9, 2019
@stale stale bot closed this as completed Mar 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants