-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make scheduler-plugins the default gang scheduler. #1747
Conversation
@tenzen-y @johnugeorge PTAL, Related: kubeflow/common#209 |
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cmd/training-operator.v1/main.go
Outdated
@@ -73,7 +73,8 @@ func main() { | |||
flag.StringVar(&leaderElectionID, "leader-election-id", "1ca428e5.training-operator.kubeflow.org", "The ID for leader election.") | |||
flag.Var(&enabledSchemes, "enable-scheme", "Enable scheme(s) as --enable-scheme=tfjob --enable-scheme=pytorchjob, case insensitive."+ | |||
" Now supporting TFJob, PyTorchJob, MXNetJob, XGBoostJob, PaddleJob. By default, all supported schemes will be enabled.") | |||
flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "none", "The scheduler to gang-schedule kubeflow jobs, defaults to none") | |||
flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "none", "The scheduler to gang-schedule kubeflow jobs, defaults to none."+ | |||
" Now supporting node, volcano, scheduler-plugins, koord-scheduler.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" Now supporting node, volcano, scheduler-plugins, koord-scheduler.") | |
" Now supporting none, volcano, scheduler-plugins, koord-scheduler.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Pull Request Test Coverage Report for Build 4412816813
💛 - Coveralls |
67452d5
to
e08164d
Compare
@tenzen-y Are there any blockers here? |
@johnugeorge We need to cut a release on the common repository. |
@tenzen-y created 0.4.7 release https://github.com/kubeflow/common/releases/tag/v0.4.7 |
@johnugeorge Thank you! @Syulin7 Can you update this PR with a new common library version? |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Syulin7 Thank you for the updates!
Mostly, LGTM. I left a comment for a nit.
cmd/training-operator.v1/main.go
Outdated
@@ -73,7 +73,8 @@ func main() { | |||
flag.StringVar(&leaderElectionID, "leader-election-id", "1ca428e5.training-operator.kubeflow.org", "The ID for leader election.") | |||
flag.Var(&enabledSchemes, "enable-scheme", "Enable scheme(s) as --enable-scheme=tfjob --enable-scheme=pytorchjob, case insensitive."+ | |||
" Now supporting TFJob, PyTorchJob, MXNetJob, XGBoostJob, PaddleJob. By default, all supported schemes will be enabled.") | |||
flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "none", "The scheduler to gang-schedule kubeflow jobs, defaults to none") | |||
flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "", "The scheduler to gang-schedule kubeflow jobs."+ | |||
" Now supporting volcano, default-scheduler, scheduler-plugins, koord-scheduler.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" Now supporting volcano, default-scheduler, scheduler-plugins, koord-scheduler.") | |
" Now Supporting volcano and scheduler-plugins. Note: If you set another scheduler name, the training-operator assumes it's the scheduler-plugins.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Syulin7 <735122171@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Syulin7 Great! Thank you for the awesome contribution!
/lgtm
/assign @johnugeorge
/hold cancel |
Thanks @Syulin7 Need to update docs( https://www.kubeflow.org/docs/components/training/job-scheduling/#running-jobs-with-gang-scheduling) regarding this. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge, Syulin7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Support for k8s v1.25 in CI * Support for k8s v1.25 in CI * Change k8s api to v1.25 * Upgrade golangci-lint version * Add changelog * Update CHANGELOG.md * Update Changelog * Merge common repo * Avoid to depend on local env when installing the code-generators (#1810) Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Make scheduler-plugins the default gang scheduler. (#1747) Signed-off-by: Syulin7 <735122171@qq.com> * Fix tests * Fix merge conflicts * Fix CI issues * Fix CI issues * Fix review comments * Add contributors in Readme file * Fix review comments * Fix review comments --------- Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Syulin7 <735122171@qq.com> Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Co-authored-by: yu lin <37265556+Syulin7@users.noreply.github.com>
What this PR does / why we need it:
Training Operator now supports many gang schedulers(volcano, scheduler-plugins), and now we can easily add koordinator gang scheduler.
Related: #1746
Reviewers can check the koordinator gang schedule feature with the koordinator in the following steps:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #1746
Checklist: