Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Multi-components workload scheduling #5115

Open
RainbowMango opened this issue Jul 1, 2024 · 0 comments
Open

[Feature] Multi-components workload scheduling #5115

RainbowMango opened this issue Jul 1, 2024 · 0 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@RainbowMango
Copy link
Member

RainbowMango commented Jul 1, 2024

What is multi-components workload
Kubernetes workloads like Deployment, StatefulSet, and Pod all consist of one component(one pod template) with one or more replicas.
AI training and Big-data workloads, usually consist of more than one component, each component might have multiple replicas.

What would you like to be added:
Provide fine-grained support for workloads which usually consist of two or more components, like:

  • Kubeflow Training/Big-data workloads
    • TFJob(Tensorflow Job), which may consist of multiple components like PS, Worker, Chief, Master, and Evaluator.
    • PyTorchJob which may consist of two components, like Master and Worker.
    • MXJob, which may consist of multiple components, like Scheduler, Server, Worker, TunerTracker, TunerServer, and Tuner.
    • XGBoostJob, which may consist of two components, like Master and Worker.
    • MPIJob, which may consist of two components, like Launcher and Worker.
    • PaddleJob, which may consist of two components, like Master and Worker.
    • SparkApplication, which may consist of two components, like driver and executor.
  • FlinkDeployment, which may consist of two components, like jobManager and taskManager.
  • Vocalno Job, which may consist of multiple tasks.
  • etc

User story

Proposal
Working In Progress: #5085
Why is this needed:

@RainbowMango RainbowMango added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Planned In Release 1.11
Development

No branches or pull requests

1 participant