Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i add the Priorityclass to the TFjob? #1466

Closed
Wenshiqi222 opened this issue Nov 15, 2021 · 3 comments
Closed

How can i add the Priorityclass to the TFjob? #1466

Wenshiqi222 opened this issue Nov 15, 2021 · 3 comments

Comments

@Wenshiqi222
Copy link

Since i wanna realize the gangscheduling with the Scheduler-Framework,i need to ensure that any pod belongs to the same PodGroup should has the same priority and different PodGroup should has different priority . i can add the podgroup in the "Labels” field,but since the "PriorityClass” belongs to Pod.Spec field, how can i add the PirorityClass to the PodGroup or the specific TFjob ?
Thanks for your help~

@gaocegege
Copy link
Member

/cc @zw0610

@zw0610
Copy link
Member

zw0610 commented Nov 15, 2021

When you define spec for each ReplicaType, it's actually PodTemplateSpec which contains a field called PriorityClassName

A simple example could be:

Please note that in this example, different priority classes are assigned to different ReplicaType

apiVersion: kubeflow.org/v1
kind: MPIJob
metadata:
  name: tensorflow-mnist-elastic
spec:
  slotsPerWorker: 1
  cleanPodPolicy: Running
  mpiReplicaSpecs:
    Launcher:
      replicas: 1
      template:
        spec:
          priorityClassName: preserved-pc
          containers:
          - image: horovod/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.5.0-py3.7-cpu
            name: mpi-launcher
            command:
            - horovodrun
            args:
            - -np
            - "2"
            - --min-np
            - "1"
            - --max-np
            - "3"
            - --host-discovery-script
            - /etc/mpi/discover_hosts.sh
            - python
            - /examples/elastic/tensorflow2_mnist_elastic.py
            resources:
              requests:
                cpu: 1
                memory: 2Gi
              limits:
                cpu: 1
                memory: 2Gi
    Worker:
      replicas: 2
      template:
        spec:
          priorityClassName: preemptible-pc
          containers:
          - image: horovod/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.5.0-py3.7-cpu
            name: mpi-worker
            resources:
              requests:
                cpu: 2
                memory: 4Gi
              limits:
                cpu: 2
                memory: 4Gi

@Wenshiqi222
Copy link
Author

When you define spec for each ReplicaType, it's actually PodTemplateSpec which contains a field called PriorityClassName

A simple example could be:

Please note that in this example, different priority classes are assigned to different ReplicaType

apiVersion: kubeflow.org/v1
kind: MPIJob
metadata:
  name: tensorflow-mnist-elastic
spec:
  slotsPerWorker: 1
  cleanPodPolicy: Running
  mpiReplicaSpecs:
    Launcher:
      replicas: 1
      template:
        spec:
          priorityClassName: preserved-pc
          containers:
          - image: horovod/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.5.0-py3.7-cpu
            name: mpi-launcher
            command:
            - horovodrun
            args:
            - -np
            - "2"
            - --min-np
            - "1"
            - --max-np
            - "3"
            - --host-discovery-script
            - /etc/mpi/discover_hosts.sh
            - python
            - /examples/elastic/tensorflow2_mnist_elastic.py
            resources:
              requests:
                cpu: 1
                memory: 2Gi
              limits:
                cpu: 1
                memory: 2Gi
    Worker:
      replicas: 2
      template:
        spec:
          priorityClassName: preemptible-pc
          containers:
          - image: horovod/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.5.0-py3.7-cpu
            name: mpi-worker
            resources:
              requests:
                cpu: 2
                memory: 4Gi
              limits:
                cpu: 2
                memory: 4Gi

Got it,really thanks for your help~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants