PyTorch and MPI Operator pulls hardcoded initContainer #1696

MuhammadZeeshan34 · 2022-12-02T15:59:50Z

Problem
While running any pytorch or MPI operator job on on-prem clusters, the worker pods init-container try to pull default image i.e. alpine:3.12 in the case of pytorch and mpioperator/kubectl-delivery in the case of MPI. Since these jobs are running on-prem thus has no access to pull these images from public repositories.

In the previous version of pytorch operator, there was an option to override the default image used for init container.

Also, it can't be provided in the core deployment of training-operator since different operators require different image for initContainers.

Is there any way to override this image for specific operators?

state:
  waiting:
    message: Back-off pulling image "alpine:3.10"
    reason: ImagePullBackOff

In the previous versions when we had separate deployment of each operators we were overriding like this

spec:
  replicas: 1
  selector:
    matchLabels:
      app: pytorch-operator
      app.kubernetes.io/component: pytorch
      app.kubernetes.io/name: pytorch-operator
      kustomize.component: pytorch-operator
      name: pytorch-operator
  template:
    metadata:
      labels:
        app: pytorch-operator
        app.kubernetes.io/component: pytorch
        app.kubernetes.io/name: pytorch-operator
        kustomize.component: pytorch-operator
        name: pytorch-operator
    spec:
      containers:
      - command:
        - /pytorch-operator.v1
        - --alsologtostderr
        - -v=1
        - --monitoring-port=8443
        - --enable-gang-scheduling=true
        - --**init-container-image=<custom-deocker-repo>alpine:3.10**

The text was updated successfully, but these errors were encountered:

andreyvelich · 2022-12-02T18:34:05Z

Thank you for raising this @MuhammadZeeshan34!

For the PyTorchJob and MPIJob you still can use PyTorchInitContainerImage and MPIKubectlDeliveryImage flag for the Training Operator deployment to modify the InitContainer image.

MuhammadZeeshan34 · 2022-12-05T14:03:49Z

Thanks @andreyvelich . It has resolved the problem.

MuhammadZeeshan34 closed this as completed Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch and MPI Operator pulls hardcoded initContainer #1696

PyTorch and MPI Operator pulls hardcoded initContainer #1696

MuhammadZeeshan34 commented Dec 2, 2022

andreyvelich commented Dec 2, 2022

MuhammadZeeshan34 commented Dec 5, 2022

PyTorch and MPI Operator pulls hardcoded initContainer #1696

PyTorch and MPI Operator pulls hardcoded initContainer #1696

Comments

MuhammadZeeshan34 commented Dec 2, 2022

andreyvelich commented Dec 2, 2022

MuhammadZeeshan34 commented Dec 5, 2022