-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
koord-descheduler: enhance LowNodeLoad scorer #2092
koord-descheduler: enhance LowNodeLoad scorer #2092
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2092 +/- ##
==========================================
+ Coverage 68.66% 68.71% +0.04%
==========================================
Files 430 430
Lines 40053 40085 +32
==========================================
+ Hits 27503 27543 +40
+ Misses 10185 10184 -1
+ Partials 2365 2358 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
/assign @eahydra |
673f651
to
bd7d556
Compare
/assign @FillZpp |
/assign @ZiMengSheng |
/assign @songtao98 |
@saintube hi,The tide process seems to be stuck, what information do I need to add to move the process forward? |
Agreed, eviction based solely on the maximum usage of pods will indeed lead to stability problems. The simplest scenario is that after the pod with the largest usage is evicted, few nodes in the scheduler can meet its resource usage requirements, and pending becomes the norm. Even if it can be scheduled on certain nodes, the single-machine hotspot problem is more likely to be triggered during peak usage of multiple pods. Therefore, I agree that this should be addressed by evicting pods whose usage is just enough to meet demand, rather than simply based on maximum usage. |
@ZiMengSheng hi,The tide process seems to be stuck, what information do I need to add to move the process forward? |
No more other things to do for now, just need a review and approval.
@LY-today I'm reviewing this and sorry for the delay. Plesae wait a little bit, I'll do my best and review it ASAP. |
thank you for your reply |
Yes, we seem to have the same problem scenario |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic logic looks good to me. However, let's figure out how to cooperate with PR #2066. The score function need to be discussed.
pkg/descheduler/framework/plugins/loadaware/utilization_util.go
Outdated
Show resolved
Hide resolved
@LY-today Please check and resolve my review comment here: #2092 (comment). To be short, When we are doing this, the function I suggest you add an arg If any further problems, feel free to comment me. :) |
and also, remember to check and modify related code carefully. I'll re-review this PR after your modification. About the failed checking of lint and unit-test, I'm fixing it in #2103 . After all, remember to rebase all commits in your PR and just keep only 1 commit. |
Received, I am now compatible |
f3c0fbd
to
a1b3e3c
Compare
@songtao98 Local make test, the feedback is successful, but the CI link is abnormal. Can it be re-executed manually? |
15ccbc4
to
f0c0b72
Compare
I noticed that was caused by lint check. You can run |
pkg/descheduler/framework/plugins/loadaware/utilization_util.go
Outdated
Show resolved
Hide resolved
@LY-today There is a typo in your PR's title. Consider modify it as koord-descheduler: enhance LowNodeLoad scorer |
done |
Signed-off-by: LY-today <724102053@qq.com>
aad46d9
to
dd39981
Compare
@songtao98 All have been modified and passed the CI stage, please check |
/lgtm |
@songtao98 done |
@songtao98 What other links do we need to continue to promote the progress of PR integration? |
An approved label needed. Maintainers will soon do final check ASAP and if no other problems, this PR will be approved and auto-merged. |
Received, thank you for your hard work. We hope to speed up the progress and solve the online problems of our cluster. |
@songtao98 Hello, are there any plans to advance this PR today? |
PTAL @hormes @ZiMengSheng |
/lgtm |
/approve |
@eahydra Please check out this PR and let's work together to solve the problem |
@hormes Please check out this PR and let's work together to solve the problem |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hormes, ZiMengSheng The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ⅰ. Describe what this PR does
What is your proposal:
When descheduler evaluates which pod to expel to reduce the utilization of high-load nodes to a reasonable level, can descheduler perform matching calculations on the actual usage of the pod and the amount of eviction resources in advance?
Why is this needed:
This can reduce the number of times a single node is repeatedly entered into the eviction logical unit, reduce the number of single node eviction instances, and indirectly improve the accuracy of the eviction results and the stability of the business while ensuring that hot issues are solved.
Is there a suggested solution, if so, please add it:
evict pod priority selector:podNowResourceUsages >= nodeNowResourceUsages - nodeHighResourceThresholds * nodeResourceAllocatable
Ⅱ. Does this pull request fix one issue?
issue链接:#1975
Ⅲ. Describe how to verify it
I added unit tests
Ⅳ. Special notes for reviews
none
V. Checklist
✅ I have written necessary docs and comments
✅ I have added necessary unit tests and integration tests
✅ All checks passed in
make test