Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Allow setting CPU pinning per worker pool #1337

Closed
surajssd opened this issue Jan 20, 2021 · 3 comments · Fixed by #1406
Closed

Allow setting CPU pinning per worker pool #1337

surajssd opened this issue Jan 20, 2021 · 3 comments · Fixed by #1406
Assignees
Labels
area/kubernetes Core Kubernetes stuff kind/enhancement New feature or request
Milestone

Comments

@surajssd
Copy link
Member

Provide a way for the user to mention CPU Management policy per worker pool kubelet.

Implementation wise this can be fixed for the kubelet service file per worker pool or Kubelet configuration file per worker pool.

An extension to this work, we need to fix: #311 and #49.

@surajssd surajssd added area/kubernetes Core Kubernetes stuff kind/enhancement New feature or request labels Jan 20, 2021
@surajssd surajssd added the proposed/next-sprint Issues proposed for next sprint label Feb 10, 2021
@iaguis iaguis removed the proposed/next-sprint Issues proposed for next sprint label Feb 10, 2021
@knrt10
Copy link
Member

knrt10 commented Feb 24, 2021

Looking into this

@knrt10 knrt10 self-assigned this Mar 1, 2021
knrt10 added a commit that referenced this issue Mar 1, 2021
This add flag to worker pools, also added a knob for user to configure
it.

closes: #1337
Signed-off-by: knrt10 <kautilya@kinvolk.io>
knrt10 added a commit that referenced this issue Mar 1, 2021
This add flag to worker pools, also added a knob for user to configure
it.

closes: #1337
Signed-off-by: knrt10 <kautilya@kinvolk.io>
@surajssd surajssd added the proposed/next-sprint Issues proposed for next sprint label Mar 3, 2021
@iaguis iaguis removed the proposed/next-sprint Issues proposed for next sprint label Mar 3, 2021
@surajssd surajssd added the proposed/next-sprint Issues proposed for next sprint label Jun 23, 2021
@iaguis iaguis removed the proposed/next-sprint Issues proposed for next sprint label Jun 23, 2021
@surajssd surajssd self-assigned this Jul 8, 2021
@surajssd
Copy link
Member Author

It turns out that a standalone cpuManagerPolicy in Kubelet config won't help. This is because when we set the cpuManagerPolicy: static then it is expecting some of the CPUs are kept aside for kubernetes components on workers like kube-proxy, kubelet, calico, etc using kubeReserved and some of the CPUs for system services like docker, systemd, sshd, etc. using systemReserved.

Now we need to figure out what are the bare minimum CPUs needed to keep the worker running and the rest can be given to the workloads for exclusive assignment.

Explanation about the static policy can be found here.

@surajssd
Copy link
Member Author

The kubelet fails to start and fails with this error:

Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: I0714 10:29:39.139047   70695 container_manager_linux.go:283] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:static ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: I0714 10:29:39.139070   70695 topology_manager.go:120] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: I0714 10:29:39.139082   70695 container_manager_linux.go:314] "Initializing Topology Manager" policy="none" scope="container"
Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: I0714 10:29:39.139091   70695 container_manager_linux.go:319] "Creating device plugin manager" devicePluginEnabled=true
Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: I0714 10:29:39.139224   70695 cpu_manager.go:158] "Detected CPU topology" topology=&{NumCPUs:48 NumCores:24 NumSockets:1 CPUDetails:map[0:{NUMANodeID:0 SocketID:0 CoreID:0} 1:{NUMANodeID:1 SocketID:0 CoreID:1} 2:{NUMANodeID:2 SocketID:0 CoreID:2} 3:{NUMANodeID:3 SocketID:0 CoreID:3} 4:{NUMANodeID:0 SocketID:0 CoreID:4} 5:{NUMANodeID:1 SocketID:0 CoreID:5} 6:{NUMANodeID:2 SocketID:0 CoreID:6} 7:{NUMANodeID:3 SocketID:0 CoreID:7} 8:{NUMANodeID:0 SocketID:0 CoreID:8} 9:{NUMANodeID:1 SocketID:0 CoreID:9} 10:{NUMANodeID:2 SocketID:0 CoreID:10} 11:{NUMANodeID:3 SocketID:0 CoreID:11} 12:{NUMANodeID:0 SocketID:0 CoreID:12} 13:{NUMANodeID:1 SocketID:0 CoreID:13} 14:{NUMANodeID:2 SocketID:0 CoreID:14} 15:{NUMANodeID:3 SocketID:0 CoreID:15} 16:{NUMANodeID:0 SocketID:0 CoreID:16} 17:{NUMANodeID:1 SocketID:0 CoreID:17} 18:{NUMANodeID:2 SocketID:0 CoreID:18} 19:{NUMANodeID:3 SocketID:0 CoreID:19} 20:{NUMANodeID:0 SocketID:0 CoreID:20} 21:{NUMANodeID:1 SocketID:0 CoreID:21} 22:{NUMANodeID:2 SocketID:0 CoreID:22} 23:{NUMANodeID:3 SocketID:0 CoreID:23} 24:{NUMANodeID:0 SocketID:0 CoreID:0} 25:{NUMANodeID:1 SocketID:0 CoreID:1} 26:{NUMANodeID:2 SocketID:0 CoreID:2} 27:{NUMANodeID:3 SocketID:0 CoreID:3} 28:{NUMANodeID:0 SocketID:0 CoreID:4} 29:{NUMANodeID:1 SocketID:0 CoreID:5} 30:{NUMANodeID:2 SocketID:0 CoreID:6} 31:{NUMANodeID:3 SocketID:0 CoreID:7} 32:{NUMANodeID:0 SocketID:0 CoreID:8} 33:{NUMANodeID:1 SocketID:0 CoreID:9} 34:{NUMANodeID:2 SocketID:0 CoreID:10} 35:{NUMANodeID:3 SocketID:0 CoreID:11} 36:{NUMANodeID:0 SocketID:0 CoreID:12} 37:{NUMANodeID:1 SocketID:0 CoreID:13} 38:{NUMANodeID:2 SocketID:0 CoreID:14} 39:{NUMANodeID:3 SocketID:0 CoreID:15} 40:{NUMANodeID:0 SocketID:0 CoreID:16} 41:{NUMANodeID:1 SocketID:0 CoreID:17} 42:{NUMANodeID:2 SocketID:0 CoreID:18} 43:{NUMANodeID:3 SocketID:0 CoreID:19} 44:{NUMANodeID:0 SocketID:0 CoreID:20} 45:{NUMANodeID:1 SocketID:0 CoreID:21} 46:{NUMANodeID:2 SocketID:0 CoreID:22} 47:{NUMANodeID:3 SocketID:0 CoreID:23}]}
Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: E0714 10:29:39.139240   70695 container_manager_linux.go:342] "Failed to initialize cpu manager" err="[cpumanager] unable to determine reserved CPU resources for static policy"
Jul 14 10:29:39 suraj-em-cluster-static-worker-0 docker[70714]: E0714 10:29:39.139254   70695 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: [cpumanager] unable to determine reserved CPU resources for static policy"

surajssd added a commit that referenced this issue Jul 14, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 2 cores for each of them only when
`cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
surajssd added a commit that referenced this issue Jul 28, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 2 cores for each of them only when
`cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
surajssd added a commit that referenced this issue Aug 6, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 2 cores for each of them only when
`cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
@iaguis iaguis added this to the v0.9.0 milestone Aug 12, 2021
surajssd added a commit that referenced this issue Aug 13, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 300m cores for kube-reserved-cpu and 1500m cores for
system-reserved-cpu when `cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
surajssd added a commit that referenced this issue Aug 24, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 300m cores for kube-reserved-cpu and 1500m cores for
system-reserved-cpu when `cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
surajssd added a commit that referenced this issue Aug 25, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 300m cores for kube-reserved-cpu and 1500m cores for
system-reserved-cpu when `cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
surajssd added a commit that referenced this issue Sep 2, 2021
This allows a user to choose the cpu manager policy on a worker pool.
Possible values are: `none` and `static`.

To make this work, kubelet also needs a static allocation of system
reserved and kubernetes reserved CPUs to be defined. So this commit also
adds the default of 300m cores for kube-reserved-cpu and 1500m cores for
system-reserved-cpu when `cpu_manager_policy` is set.

closes: #1337

Signed-off-by: knrt10 <kautilya@kinvolk.io>
Co-authored-by: Suraj Deshmukh <suraj@kinvolk.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/kubernetes Core Kubernetes stuff kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants