support for MicroShift/OpenShift 4.15+ through helm chart, static manifest and time-slicing manual tests #702

arthur-r-oliveira · 2024-05-10T11:36:19Z

Adding deployments/static/nvidia-device-plugin-privileged-with-service-account-and-time-slicing.yml, with time-slicing configuration inspired by GPU Operator https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html but for microshift support.

arthur-r-oliveira · 2024-05-10T11:41:07Z

Testing with Azure VM:

MicroShift tested release:

[azureuser@microshift02 k8s-device-plugin]$ microshift version
MicroShift Version: 4.15.9
Base OCP Version: 4.15.9
[azureuser@microshift02 k8s-device-plugin]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 9.3 (Plow)
[azureuser@microshift02 ~]$ lscpu |grep CPU\(s
CPU(s):                             4
On-line CPU(s) list:                0-3
NUMA node0 CPU(s):                  0-3
[azureuser@microshift02 ~]$ free -m
               total        used        free      shared  buff/cache   available
Mem:           27808        2772       22317          57        3202       25035
Swap:              0           0           0
[azureuser@microshift02 ~]$ lspci 
0001:00:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
07af:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev 80)
[azureuser@microshift02 ~]$ nvidia-smi -l
Fri May 10 11:41:37 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000001:00:00.0 Off |                    0 |
| N/A   46C    P0             35W /   70W |      70MiB /  15360MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    175931      C   /cuda-samples/vectorAdd                        36MiB |
|    0   N/A  N/A    175933      C   /cuda-samples/vectorAdd                        14MiB |
|    0   N/A  N/A    175936      C   /cuda-samples/vectorAdd                        10MiB |
|    0   N/A  N/A    175940      C   /cuda-samples/vectorAdd                        10MiB |
+-----------------------------------------------------------------------------------------+

Manifest file:

[azureuser@microshift02 k8s-device-plugin]$ sudo cat /etc/microshift/manifests/nvidia-device-plugin.yml 
# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# == ABOUT THIS STATIC DEPLOYMENT
#
# The pod needs to run with privileged security context to be able to
# mount a hostPath volume. To avoid granting permissions too broadly,
# it is safer to deploy the pod in its own namespace with its own service
# account.
#
# This static deployment is aimed at OpenShift and creates:
#
#   - A namespace with the labels for the PodSecurity admission webhook
#   - A role that grants the read nodes resources and use the privileged
#     security context constraints (SCC) in Microshift.
#   - A service account that the pod will use to run in the namespace.
#   - A role binding to link the cluster role and the service account in
#     the scope of the namespace.
#   - The device plugin daemon set with the privileged security context
#     and the service account.
#
# The other attributes of the device plugin daemon set are unchanged from
# the standard static deployment definition with compatibility with CPU
# Manager.
#
# To deploy it in Microshift, simply put this file in
# /etc/microshift/manifests and add the file to the list of resources in
# your Kustomization resource.
#
# See Microshift documentation for more details on the automated
# deployment of resources:
# https://access.redhat.com/documentation/en-us/red_hat_build_of_microshift/
#

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    pod-security.kubernetes.io/enforce: privileged
  name: nvidia-device-plugin

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: nvidia-device-plugin
  namespace: nvidia-device-plugin
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - security.openshift.io
    resourceNames:
      - privileged
    resources:
      - securitycontextconstraints
    verbs:
      - use
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nvidia-device-plugin
  namespace: nvidia-device-plugin
---
apiVersion: v1
data:
  nvidia-plugin-configs: |-
    version: v1
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 4
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/name: nvidia-device-plugin
    app.kubernetes.io/version: 0.15.0
  name: nvidia-device-plugin-configs
  namespace: nvidia-device-plugin
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nvidia-device-plugin
  namespace: nvidia-device-plugin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nvidia-device-plugin
subjects:
  - kind: ServiceAccount
    name: nvidia-device-plugin
    namespace: nvidia-device-plugin
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: nvidia-device-plugin
    app.kubernetes.io/version: 0.15.0
  name: nvidia-device-plugin-clusterrole
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: nvidia-device-plugin-clusterrolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nvidia-device-plugin-clusterrole
subjects:
- kind: ServiceAccount
  name: nvidia-device-plugin
  namespace: nvidia-device-plugin
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: nvidia-device-plugin
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      priorityClassName: "system-node-critical"
      containers:
      - command:
        - config-manager
        env:
        - name: ONESHOT
          value: "false"
        - name: KUBECONFIG
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: NODE_LABEL
          value: nvidia.com/device-plugin.config
        - name: CONFIG_FILE_SRCDIR
          value: /available-configs
        - name: CONFIG_FILE_DST
          value: /config/config.yaml
        - name: DEFAULT_CONFIG
          value: nvidia-plugin-configs
        - name: FALLBACK_STRATEGIES
          value: named,single
        - name: SEND_SIGNAL
          value: "true"
        - name: SIGNAL
          value: "1"
        - name: PROCESS_TO_SIGNAL
          value: nvidia-device-plugin
        image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
        imagePullPolicy: IfNotPresent
        name: nvidia-device-plugin-sidecar
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /available-configs
          name: available-configs
        - mountPath: /config
          name: config
      - command:
        - nvidia-device-plugin
        env:
        - name: MPS_ROOT
          value: /run/nvidia/mps
        - name: CONFIG_FILE
          value: /config/config.yaml
        - name: DEFAULT_CONFIG
          value: nvidia-plugin-configs
        - name: NVIDIA_MIG_MONITOR_DEVICES
          value: all
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: compute,utility
        image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
        imagePullPolicy: IfNotPresent
        name: nvidia-device-plugin-ctr
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/kubelet/device-plugins
          name: device-plugin
        - mountPath: /dev/shm
          name: mps-shm
        - mountPath: /mps
          name: mps-root
        - mountPath: /var/run/cdi
          name: cdi-root
        - mountPath: /available-configs
          name: available-configs
        - mountPath: /config
          name: config
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - config-manager
        env:
        - name: ONESHOT
          value: "true"
        - name: KUBECONFIG
          value: ""
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: NODE_LABEL
          value: nvidia.com/device-plugin.config
        - name: CONFIG_FILE_SRCDIR
          value: /available-configs
        - name: DEFAULT_CONFIG
          value: nvidia-plugin-configs
        - name: CONFIG_FILE_DST
          value: /config/config.yaml
        - name: FALLBACK_STRATEGIES
          value: named,single
        - name: SEND_SIGNAL
          value: "false"
        - name: SIGNAL
          value: ""
        - name: PROCESS_TO_SIGNAL
          value: ""
        volumeMounts:
          - name: available-configs
            mountPath: /available-configs
          - name: config
            mountPath: /config
        image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
        imagePullPolicy: IfNotPresent
        name: nvidia-device-plugin-init
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        privileged: true
      serviceAccount: nvidia-device-plugin
      serviceAccountName: nvidia-device-plugin
      shareProcessNamespace: true
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists
      volumes:
      - hostPath:
          path: /var/lib/kubelet/device-plugins
          type: ""
        name: device-plugin
      - hostPath:
          path: /run/nvidia/mps
          type: DirectoryOrCreate
        name: mps-root
      - hostPath:
          path: /run/nvidia/mps/shm
          type: ""
        name: mps-shm
      - hostPath:
          path: /var/run/cdi
          type: DirectoryOrCreate
        name: cdi-root
      - configMap:
          defaultMode: 420
          name: nvidia-device-plugin-configs
        name: available-configs
      - emptyDir: {}
        name: config

Verification POD:

[azureuser@microshift02 ~]$ cat time-v.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: time-slicing-verification
  namespace: nvidia-device-plugin
  labels:
    app: time-slicing-verification
spec:
  replicas: 5
  selector:
    matchLabels:
      app: time-slicing-verification
  template:
    metadata:
      labels:
        app: time-slicing-verification
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      hostPID: true
      serviceAccountName: nvidia-device-plugin
      containers:
        - name: cuda-sample-vector-add
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
            seccompProfile:
              type: "RuntimeDefault"
          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
          command: ["/bin/bash", "-c", "--"]
          args:
            - while true; do /cuda-samples/vectorAdd; done
          resources:
           limits:
             nvidia.com/gpu: 1

Testing:

[azureuser@microshift02 ~]$ oc get node -o json | jq -r '.items[0].status.capacity'
{
  "cpu": "4",
  "ephemeral-storage": "8128Mi",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "28476128Ki",
  "nvidia.com/gpu": "4",
  "pods": "250"
}
[azureuser@microshift02 ~]$ oc get pods -o wide
NAME                                        READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
nvidia-device-plugin-daemonset-nrvw7        2/2     Running   0          10m   10.42.0.18   microshift02   <none>           <none>
time-slicing-verification-697c64dd8-6dw27   1/1     Running   0          10m   10.42.0.19   microshift02   <none>           <none>
time-slicing-verification-697c64dd8-9mgst   1/1     Running   0          10m   10.42.0.21   microshift02   <none>           <none>
time-slicing-verification-697c64dd8-xnvsb   1/1     Running   0          10m   10.42.0.22   microshift02   <none>           <none>
time-slicing-verification-697c64dd8-z5bn7   0/1     Pending   0          10m   <none>       <none>         <none>           <none>
time-slicing-verification-697c64dd8-zbqjb   1/1     Running   0          10m   10.42.0.20   microshift02   <none>           <none>

elezar · 2024-05-13T11:23:50Z

@arthur-r-oliveira thanks for the contribution. The static deployments are around mostly for legacy purposes and not something we test regularly. The recommended mechanism for deploying the plugin is using the provided Helm charts.

Would you be able to provide documentation on how to deploy the device plugin using helm for your target use case instead. If there is functionality missing that prevents this we can address any shortcomings.

arthur-r-oliveira · 2024-05-14T16:18:29Z

@arthur-r-oliveira thanks for the contribution. The static deployments are around mostly for legacy purposes and not something we test regularly. The recommended mechanism for deploying the plugin is using the provided Helm charts.

Would you be able to provide documentation on how to deploy the device plugin using helm for your target use case instead. If there is functionality missing that prevents this we can address any shortcomings.

@elezar thanks for the heads-up! I'll give another with the helm charts and get back to you shortly.

arthur-r-oliveira · 2024-05-27T14:29:18Z

@elezar I've closed this PR as have created much noise here, but will dispatch a second with small changes for deployments/helm/nvidia-device-plugin/templates/role-binding.yml and deployments/helm/nvidia-device-plugin/templates/role.yml.

As you can see, original helm chart doesn't have appropriate Pod Security configurations in place to allow running nvidia-device-plugin with MicroShift 4.15+/k8s 1.28+

[root@lenovo-p620-01 k8s-device-plugin]# microshift version
MicroShift Version: 4.15.13
Base OCP Version: 4.15.13
[root@lenovo-p620-01 k8s-device-plugin]# oc get nodes
NAME                                        STATUS   ROLES                         AGE   VERSION
lenovo-p620-01.khw.eng.bos2.dc.redhat.com   Ready    control-plane,master,worker   10d   v1.28.9
[root@lenovo-p620-01 k8s-device-plugin]# 

[root@lenovo-p620-01 k8s-device-plugin]# git branch -a
* main
  remotes/origin/HEAD -> origin/main
  remotes/origin/feature/microshift_timeslicing
  remotes/origin/main
[root@lenovo-p620-01 k8s-device-plugin]#  helm upgrade -i nvdp deployments/helm/nvidia-device-plugin/     --version=0.15.0     --namespace nvidia-device-plugin     --create-namespace     --set-file config.map.config=/tmp/dp-example-config0.yaml
Release "nvdp" does not exist. Installing it now.
W0527 10:11:17.712709  924228 warnings.go:70] would violate PodSecurity "restricted:v1.24": privileged (containers "mps-control-daemon-mounts", "mps-control-daemon-ctr" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "mps-root", "mps-shm" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W0527 10:11:17.715342  924228 warnings.go:70] would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.capabilities.drop=["ALL"]; containers "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must not include "SYS_ADMIN" in securityContext.capabilities.add), restricted volume types (volumes "device-plugin", "mps-root", "mps-shm", "cdi-root" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
NAME: nvdp
LAST DEPLOYED: Mon May 27 10:11:17 2024
NAMESPACE: nvidia-device-plugin
STATUS: deployed
REVISION: 1
TEST SUITE: None

With the fixes:

[root@lenovo-p620-01 k8s-device-plugin]# git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   deployments/helm/nvidia-device-plugin/templates/role-binding.yml
	modified:   deployments/helm/nvidia-device-plugin/templates/role.yml

no changes added to commit (use "git add" and/or "git commit -a")
[root@lenovo-p620-01 k8s-device-plugin]#  helm upgrade -i nvdp deployments/helm/nvidia-device-plugin/     --version=0.15.0     --namespace nvidia-device-plugin     --create-namespace     --set-file config.map.config=/tmp/dp-example-config0.yaml
Release "nvdp" does not exist. Installing it now.
NAME: nvdp
LAST DEPLOYED: Mon May 27 10:28:45 2024
NAMESPACE: nvidia-device-plugin
STATUS: deployed
REVISION: 1
TEST SUITE: None
[root@lenovo-p620-01 k8s-device-plugin]# oc get pods 
NAME                              READY   STATUS    RESTARTS   AGE
nvdp-nvidia-device-plugin-92wbp   2/2     Running   0          3s
[root@lenovo-p620-01 k8s-device-plugin]#

To follow.

arthur-r-oliveira changed the title ~~adding deployments/static/nvidia-device-plugin-privileged-with-servic…~~ adding deployments/static manifest with time-slicing support for MicroShift May 10, 2024

arthur-r-oliveira force-pushed the feature/microshift_timeslicing branch from eb3eb94 to 718b6c2 Compare May 10, 2024 14:57

arthur-r-oliveira mentioned this pull request May 10, 2024

Is it necessary to install and run the gpu-operator for time-slicing to work? #461

Open

arthur-r-oliveira changed the title ~~adding deployments/static manifest with time-slicing support for MicroShift~~ sample values and static manifest for time-slicing support with MicroShift May 26, 2024

arthur-r-oliveira changed the title ~~sample values and static manifest for time-slicing support with MicroShift~~ sample values for helm charts and static manifest for time-slicing support with MicroShift May 26, 2024

arthur-r-oliveira force-pushed the feature/microshift_timeslicing branch from 0a3cb48 to 48cf250 Compare May 26, 2024 13:49

arthur-r-oliveira changed the title ~~sample values for helm charts and static manifest for time-slicing support with MicroShift~~ support for MicroShift/OpenShift 4.15+ through helm chart, static manifest and time-slicing manual tests May 26, 2024

arthur-r-oliveira force-pushed the feature/microshift_timeslicing branch from 35fe28a to b74b0f1 Compare May 26, 2024 21:45

arthur-r-oliveira closed this May 27, 2024

arthur-r-oliveira force-pushed the feature/microshift_timeslicing branch from aa00979 to 80384dc Compare May 27, 2024 10:19

arthur-r-oliveira mentioned this pull request May 27, 2024

MicroShift 4.15+ support with helm charts #745

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for MicroShift/OpenShift 4.15+ through helm chart, static manifest and time-slicing manual tests #702

support for MicroShift/OpenShift 4.15+ through helm chart, static manifest and time-slicing manual tests #702

arthur-r-oliveira commented May 10, 2024 •

edited

Loading

arthur-r-oliveira commented May 10, 2024 •

edited

Loading

elezar commented May 13, 2024

arthur-r-oliveira commented May 14, 2024

arthur-r-oliveira commented May 27, 2024 •

edited

Loading

support for MicroShift/OpenShift 4.15+ through helm chart, static manifest and time-slicing manual tests #702

support for MicroShift/OpenShift 4.15+ through helm chart, static manifest and time-slicing manual tests #702

Conversation

arthur-r-oliveira commented May 10, 2024 • edited Loading

arthur-r-oliveira commented May 10, 2024 • edited Loading

elezar commented May 13, 2024

arthur-r-oliveira commented May 14, 2024

arthur-r-oliveira commented May 27, 2024 • edited Loading

arthur-r-oliveira commented May 10, 2024 •

edited

Loading

arthur-r-oliveira commented May 10, 2024 •

edited

Loading

arthur-r-oliveira commented May 27, 2024 •

edited

Loading