Skip to content

Commit

Permalink
Updated examples/pytorch to disable istio sidecar injection (kubeflow…
Browse files Browse the repository at this point in the history
…#2004)

* Removed istio annotations from examples. Added comments to examples/pytorch/README.md

Signed-off-by: Juan Diego Colmenares Fernandez <cielocfd@gmail.com>

* Update demo.yaml

Removed pull policy

Signed-off-by: Juan Diego Colmenares Fernandez <cielocfd@gmail.com>

---------

Signed-off-by: Juan Diego Colmenares Fernandez <cielocfd@gmail.com>
Signed-off-by: deepanker13 <deepanker.gupta@nutanix.com>
  • Loading branch information
jdcfd authored and deepanker13 committed Apr 8, 2024
1 parent b4f240a commit 194065f
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 21 deletions.
11 changes: 10 additions & 1 deletion examples/pytorch/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## Installation & deployment tips
1. You need to configure your node to utilize GPU. In order this can be done the following way:
1. You need to configure your node to utilize GPU. This can be done the following way:
* Install [nvidia-docker2](https://github.com/NVIDIA/nvidia-docker)
* Connect to your MasterNode and set nvidia as the default run in `/etc/docker/daemon.json`:
```
Expand Down Expand Up @@ -28,3 +28,12 @@
3. Building image. Each example has prebuilt images that are stored on google cloud resources (GCR). If you want to create your own image we recommend using dockerhub. Each example has its own Dockerfile that we strongly advise to use. To build your custom image follow instruction on [TechRepublic](https://www.techrepublic.com/article/how-to-create-a-docker-image-and-push-it-to-docker-hub/).
4. To deploy your job we recommend using official [kubeflow documentation](https://www.kubeflow.org/docs/guides/components/pytorch/). Each example has example yaml files for two versions of apis. Feel free to modify them, e.g. image or number of GPUs.
**Note**: PyTorch job doesn’t work in a user namespace by default because of Istio [automatic sidecar injection](https://istio.io/v1.3/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). In order to get it running, it needs annotation sidecar.istio.io/inject: "false" to disable it for either PyTorch pods or namespace. For example:
```yaml
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
```
2 changes: 0 additions & 2 deletions examples/pytorch/cpu-demo/demo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ spec:
containers:
- name: pytorch
image: pytorch-cpu:py3.8
imagePullPolicy: Always
command:
- "torchrun"
- "demo.py"
Expand All @@ -25,7 +24,6 @@ spec:
containers:
- name: pytorch
image: pytorch-cpu:py3.8
imagePullPolicy: Always
command:
- "torchrun"
- "demo.py"
6 changes: 0 additions & 6 deletions examples/pytorch/mnist/v1/pytorch_job_mnist_gloo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ spec:
replicas: 1
restartPolicy: OnFailure
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: pytorch
Expand All @@ -24,9 +21,6 @@ spec:
replicas: 1
restartPolicy: OnFailure
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: pytorch
Expand Down
6 changes: 0 additions & 6 deletions examples/pytorch/mnist/v1/pytorch_job_mnist_mpi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ spec:
replicas: 1
restartPolicy: OnFailure
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: pytorch
Expand All @@ -24,9 +21,6 @@ spec:
replicas: 1
restartPolicy: OnFailure
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: pytorch
Expand Down
6 changes: 0 additions & 6 deletions examples/pytorch/mnist/v1/pytorch_job_mnist_nccl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ spec:
replicas: 1
restartPolicy: OnFailure
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: pytorch
Expand All @@ -23,9 +20,6 @@ spec:
replicas: 1
restartPolicy: OnFailure
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: pytorch
Expand Down

0 comments on commit 194065f

Please sign in to comment.