-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the plain pod workers with Indexed Job #613
Comments
Is this something that is happening in the training-operator too? |
Yes, the training-operator has a plan to migrate Indexed Job as well: kubeflow/training-operator#1718 However, we (training-operator) haven't decided yet which ones (using mpi-operator as a library or migrating to Indexed job) we should work on first. |
Ah, in the training-operator, the last piece to migrate to the indexed job is |
This is great. Good to know that elastic semantics can be maintained.
I am ok with the timeline. |
Part-of: #373
Currently, the mpi-operator manages the plain pod workers. However, the management mechanism is similar to kubernetes batch/job, which is a reinvention of the wheel, although I understand the batch/job didn't have all features to replace the plain pod with batch/job in the past.
Because the Indexed job supports Elastically (Elastic Indexed job) by default since the kubernetes v1.27, even if we replace the plain pod management with Indexed job, we can support MPIJob with elastic semantics like the horovod.
So, I would propose replacing the plain pod workers with Indexd Job after the kubernetes v1.26 (EoL: 2024-02-28) has been deprecated.
Let me know what you think. @alculquicondor @terrytangyuan
The text was updated successfully, but these errors were encountered: