Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal load-aware scheduling plugin #135

Merged

Conversation

eahydra
Copy link
Member

@eahydra eahydra commented May 13, 2022

Signed-off-by: Tao Li joseph.t.lee@outlook.com

Ⅰ. Describe what this PR does

Although Koordinator provides the co-location mechanism to improve the resource utilization of the cluster and reduce costs, it does not yet have the ability to control the utilization level of the cluster dimension. This proposal defines a scheduling plugin to help Koordinator achieve this capability.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

@eahydra
Copy link
Member Author

eahydra commented May 13, 2022

design for #95

/assign @hormes @allwmh @saintube

@hormes
Copy link
Member

hormes commented May 13, 2022

Ref #95

@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch from bc3b337 to 63b447f Compare May 13, 2022 08:33
@codecov-commenter
Copy link

codecov-commenter commented May 13, 2022

Codecov Report

Merging #135 (dd9a29c) into main (c722334) will decrease coverage by 0.92%.
The diff coverage is n/a.

❗ Current head dd9a29c differs from pull request most recent head ab5e1af. Consider uploading reports for the commit ab5e1af to get more accurate results

@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
- Coverage   56.38%   55.45%   -0.93%     
==========================================
  Files          78       83       +5     
  Lines        7366     7787     +421     
==========================================
+ Hits         4153     4318     +165     
- Misses       2840     3086     +246     
- Partials      373      383      +10     
Flag Coverage Δ
unittests 55.45% <ø> (-0.93%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...controller/noderesource/noderesource_controller.go 0.00% <0.00%> (ø)
...ontroller/noderesource/nodemetric_event_handler.go 0.00% <0.00%> (ø)
...controller/noderesource/configmap_event_handler.go 0.00% <0.00%> (ø)
...slo-controller/noderesource/resource_calculator.go 100.00% <0.00%> (ø)
pkg/slo-controller/noderesource/noderesource.go 0.00% <0.00%> (ø)
pkg/koordlet/resmanager/resmanager.go 56.69% <0.00%> (+2.67%) ⬆️
pkg/koordlet/resmanager/memory_evict.go 72.07% <0.00%> (+59.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c722334...ab5e1af. Read the comment docs.

@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch from 63b447f to 71950e4 Compare May 16, 2022 03:42
@eahydra
Copy link
Member Author

eahydra commented May 16, 2022

@jasonliu747 @saintube I updated the proposal with extend NodeMetricSpec to support user-defined reporting period and aggregation period.

@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch 3 times, most recently from 6bc8087 to dd9a29c Compare May 16, 2022 09:06
Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall. However, I think some variables can have a better naming. We can figure it out later. Others have any suggestions?
/lgtm

@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch from dd9a29c to fb0624a Compare May 17, 2022 02:33
@koordinator-bot koordinator-bot bot removed the lgtm label May 17, 2022
@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch 2 times, most recently from 8e0951d to b078f4d Compare May 17, 2022 03:47
@hormes
Copy link
Member

hormes commented May 17, 2022

Please add a record in the section of unsolved problems to support portraits of different periods. For example, long-running pods need to be scheduled with long-period profiling, while short-period pods should be scheduled with short-period profiling.

@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch from b078f4d to ab5e1af Compare May 17, 2022 07:47
Signed-off-by: Tao Li <joseph.t.lee@outlook.com>
@eahydra eahydra force-pushed the add_loadbalance_scheduling_design branch from ab5e1af to 82d6eae Compare May 17, 2022 08:15
@eahydra
Copy link
Member Author

eahydra commented May 17, 2022

Please add a record in the section of unsolved problems to support portraits of different periods. For example, long-running pods need to be scheduled with long-period profiling, while short-period pods should be scheduled with short-period profiling.

Updated. @hormes

@eahydra eahydra requested a review from jasonliu747 May 17, 2022 12:31
@eahydra eahydra assigned eahydra and unassigned allwmh, saintube and jasonliu747 May 17, 2022
@eahydra eahydra added this to the v0.4 milestone May 17, 2022
@saintube
Copy link
Member

/lgtm

@hormes
Copy link
Member

hormes commented May 17, 2022

/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 0cf1616 into koordinator-sh:main May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants