Multi-gpu in a single pod #362

wallarug · 2021-11-19T12:12:43Z

Hi Team,

I am trying to run a Kubernetes Pod with multiple GPUs in the same pod. I can't seem to find any resources for how to do this. All the resources I find are 1 pod = 1 gpu. I don't want this. I want to be able to spin up 2x4gpu (8gpu) pods or different combinations.

It seems this has been asked before in #219 #331 but no solid answers in there.

The YAML file I have based my testing on is from this tutorial: https://towardsdatascience.com/pytorch-distributed-on-kubernetes-71ed8b50a7ee

I have changed part of it to reflect using 2 GPUs in 1 pod.

 Worker:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          volumes:
            - name: pv-k8s-storage
              persistentVolumeClaim:
                claimName: pvc-k8s-storage
          containers:
            - name: pytorch
              command: ["/bin/sh"]
              args: ["-c", "/usr/bin/python3 -m pip install --upgrade pip; pip install tensorboardX pandas scikit-learn; python3 ranzrc.py --epochs 5 --ba$
              image: pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
              resources:
                requests:
                  nvidia.com/gpu: 2
                limits:
                  nvidia.com/gpu: 2

I am seeing similar behaviour to #219 where when I spin this up, only 1 GPU gets used by the test code (when I told it to use 2).

Any assistance or pointing in the right direction on this would be great. Thanks!

The text was updated successfully, but these errors were encountered:

Shuai-Xie · 2021-11-20T15:12:59Z

Maybe you can have a look at what I do in this issue #354 (comment).

Best wishes.

gaocegege · 2021-11-21T01:18:28Z

This repository will be deprecated soon, please open an issue at github.com/kubeflow/training-operator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu in a single pod #362

Multi-gpu in a single pod #362

wallarug commented Nov 19, 2021

Shuai-Xie commented Nov 20, 2021

gaocegege commented Nov 21, 2021

Multi-gpu in a single pod #362

Multi-gpu in a single pod #362

Comments

wallarug commented Nov 19, 2021

Shuai-Xie commented Nov 20, 2021

gaocegege commented Nov 21, 2021