Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Support for Mid tier #1361

Closed
11 tasks done
hormes opened this issue Jun 7, 2023 · 1 comment
Closed
11 tasks done

[proposal] Support for Mid tier #1361

hormes opened this issue Jun 7, 2023 · 1 comment

Comments

@hormes
Copy link
Member

hormes commented Jun 7, 2023

In the resource model chapter, we mentioned the type of Mid tier, which is used to support long-running jobs. Such jobs are typically found in scenarios such as big data computing, real-time computing, and AI training. This issue is used to track work related to Mid tier support.

On the current framework, things that need to be done to support Mid tier include:

  • Long-period peak prediction. In order not to increase excessive resource overhead (including third-party dependencies), this peak prediction is best done on the node side.
    • a histogram statistics tool
    • collecte metrics and store in the histogram
    • checkpoint histogram and restore
    • peak prediction algorithm
  • The slo-controller needs to be aware of this prediction, at least at the Priority band level, to calculate the amount of schedulable resources.
    • API of nodemetrics
    • report the prediction metrics
  • The koord-scheduler needs to be aware of the amount of schedulable resources, similar to how batch obtains by extended resources.
    • API of node allocable resources
    • calculate the total schedulable resources and update the extended resources
@hormes hormes added the kind/proposal Create a report to help us improve label Jun 7, 2023
@hormes hormes changed the title [proposal] Mid tier [proposal] Support for Mid tier in resource model Jun 7, 2023
@hormes hormes changed the title [proposal] Support for Mid tier in resource model [proposal] Support for Mid tier Jun 7, 2023
@saintube
Copy link
Member

saintube commented Jun 7, 2023

/area koord-scheduler
/area koord-manager
/area koordlet

@hormes hormes added this to the v1.3 milestone Jun 7, 2023
@saintube saintube mentioned this issue Jun 12, 2023
3 tasks
@jasonliu747 jasonliu747 modified the milestones: v1.3, v1.4 Jun 13, 2023
@hormes hormes modified the milestones: v1.4, v1.3 Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants