-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support koordinator gang scheduler plugin #1746
Comments
@Syulin7 Thanks for creating this issue. Generally, It sounds good to me. OTOH, I have some questions.
|
@tenzen-y Thanks for the review.
|
@Syulin7 I appreciate your clear explanation. I designed the podgroup control for the scheduler-plugins so that we could set a secondary scheduler name in the kubeflow/common repo. So you can use koordinator integration by setting Also, as I can see from kubeflow/common#209, the whole of the codes are almost similar to the codes for the coscheduling plugin. Hence I would suggest we only modify Before: training-operator/cmd/training-operator.v1/main.go Lines 122 to 124 in d640175
After: } else if strings.EqualFold(gangSchedulerName, string(common.GangSchedulerSchedulerPlugins)) ||
strings.EqualFold(gangSchedulerName, string(common.GangSchedulerKoordScheduler)) {
gangSchedulingSetupFunc = common.GenSchedulerPluginsSetupFunc(mgr.GetClient())
} |
@tenzen-y Thanks for your suggestion. My viewpoint is that there is no need to specify the scheduler name(koord-scheduler) in CustomJob resources (similar to the implementation of Volcano and make it more user-friendly), which results in the whole of the codes are almost similar to the codes for the coscheduling plugin. the only difference is:
Perhaps in the future, when we need to support the Koordinator new Gang feature, the implementation of the koordinator podgroup control will be separated out, and there will not be many duplicated codes. And now I plan to modify it according to your suggestion. WDYT |
@Syulin7 I agree with that. However, I would like to avoid duplicated codes since that makes the training-operator chaotic. So, I would suggest using the embedded struct and override the method to reduce duplicated codes like my review comments in kubeflow/common#209. |
Agree, I will modify it according to your suggestion. |
@tenzen-y I have updated the PR, PTAL. By the way, the Slack channel in the Readme is no longer available, how can I join the Slack channel? |
@Syulin7 |
so we didn't support koordinator gang-scheduler now? |
@lowang-bh No, we support koordinator gang-scheduler. |
The koordinator gang-scheduler uses kubernetes-sigs/scheduler-plugins PodGroup. That scheduler doesn't have a custom PodGroup resource. |
/kind feature
Training Operator now supports many gang schedulers(volcano, scheduler-plugins), and now we can easily add koordinator gang scheduler.
Koordinator gang scheduler use PodGroup defined in scheduler-plugins/coscheduling, and will be compatible with the PodGroup.
About Koordinator: https://github.com/koordinator-sh/koordinator
Koordinator gang scheduler: https://koordinator.sh/docs/user-manuals/gang-scheduling
So I would like to support koordinator gang scheduler plugin.
@tenzen-y @johnugeorge WDYT?
The text was updated successfully, but these errors were encountered: