Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: add new pod estimate with loadaware plugin #1992

Merged
merged 2 commits into from
Jun 12, 2024

Conversation

zwForrest
Copy link
Contributor

Ⅰ. Describe what this PR does

When scheduling a new pod, it is necessary to consider the resource utilization of the new pod with the loadaware plugin, to prevent the node utilization from becoming too high after the new pod is scheduled.

This can prevent nodes with excessive utilization entering score phase.

The resource utilization for the new pod here is calculated using the original estimated method.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Apr 8, 2024

Codecov Report

Attention: Patch coverage is 78.26087% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 68.75%. Comparing base (2dc8735) to head (8943b2d).
Report is 44 commits behind head on main.

Files Patch % Lines
pkg/scheduler/plugins/loadaware/load_aware.go 78.26% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1992      +/-   ##
==========================================
+ Coverage   67.50%   68.75%   +1.25%     
==========================================
  Files         421      423       +2     
  Lines       47100    39174    -7926     
==========================================
- Hits        31795    26936    -4859     
+ Misses      12990     9909    -3081     
- Partials     2315     2329      +14     
Flag Coverage Δ
unittests 68.75% <78.26%> (+1.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zwForrest zwForrest force-pushed the feat/updateloadaware branch 2 times, most recently from 84026c9 to 8e7711f Compare April 8, 2024 13:02
@hormes
Copy link
Member

hormes commented Apr 28, 2024

What is the status of this issue?

@zwForrest
Copy link
Contributor Author

What is the status of this issue?

The corresponding changes have been updated.

pkg/scheduler/plugins/loadaware/load_aware.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/loadaware/load_aware.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/loadaware/load_aware.go Outdated Show resolved Hide resolved
@eahydra
Copy link
Member

eahydra commented May 7, 2024

I'm sorry I've been away from X for a while.

I reviewed this PR and reviewed the implementation of the loadaware plugin. I noticed that there are many areas that need to be improved in the code I implemented in the past. Especially in this PR, you can see that the logic of filter and score are very similar. We should calculate the usage first, and then decide whether to score or filter. When calculating usage, we can decide whether to use estimated.

These codes should be wrapped as a function that can be used in filter and score stage.

prodPod := extension.GetPodPriorityClassWithDefault(pod) == extension.PriorityProd && p.args.ScoreAccordingProdUsage
podMetrics := buildPodMetricMap(p.podLister, nodeMetric, prodPod)
estimatedUsed, err := p.estimator.EstimatePod(pod)
if err != nil {
return 0, nil
}
assignedPodEstimatedUsed, estimatedPods := p.estimatedAssignedPodUsed(nodeName, nodeMetric, podMetrics, prodPod)
for resourceName, value := range assignedPodEstimatedUsed {
estimatedUsed[resourceName] += value
}
podActualUsages, estimatedPodActualUsages := sumPodUsages(podMetrics, estimatedPods)
if prodPod {
for resourceName, quantity := range podActualUsages {
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
} else {
if nodeMetric.Status.NodeMetric != nil {
var nodeUsage *slov1alpha1.ResourceMap
if scoreWithAggregation(p.args.Aggregated) {
nodeUsage = getTargetAggregatedUsage(nodeMetric, &p.args.Aggregated.ScoreAggregatedDuration, p.args.Aggregated.ScoreAggregationType)
} else {
nodeUsage = &nodeMetric.Status.NodeMetric.NodeUsage
}
if nodeUsage != nil {
for resourceName, quantity := range nodeUsage.ResourceList {
if q := estimatedPodActualUsages[resourceName]; !q.IsZero() {
quantity = quantity.DeepCopy()
if quantity.Cmp(q) >= 0 {
quantity.Sub(q)
}
}
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
}
}
}

@zwForrest
Copy link
Contributor Author

I'm sorry I've been away from X for a while.

I reviewed this PR and reviewed the implementation of the loadaware plugin. I noticed that there are many areas that need to be improved in the code I implemented in the past. Especially in this PR, you can see that the logic of filter and score are very similar. We should calculate the usage first, and then decide whether to score or filter. When calculating usage, we can decide whether to use estimated.

These codes should be wrapped as a function that can be used in filter and score stage.

prodPod := extension.GetPodPriorityClassWithDefault(pod) == extension.PriorityProd && p.args.ScoreAccordingProdUsage
podMetrics := buildPodMetricMap(p.podLister, nodeMetric, prodPod)
estimatedUsed, err := p.estimator.EstimatePod(pod)
if err != nil {
return 0, nil
}
assignedPodEstimatedUsed, estimatedPods := p.estimatedAssignedPodUsed(nodeName, nodeMetric, podMetrics, prodPod)
for resourceName, value := range assignedPodEstimatedUsed {
estimatedUsed[resourceName] += value
}
podActualUsages, estimatedPodActualUsages := sumPodUsages(podMetrics, estimatedPods)
if prodPod {
for resourceName, quantity := range podActualUsages {
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
} else {
if nodeMetric.Status.NodeMetric != nil {
var nodeUsage *slov1alpha1.ResourceMap
if scoreWithAggregation(p.args.Aggregated) {
nodeUsage = getTargetAggregatedUsage(nodeMetric, &p.args.Aggregated.ScoreAggregatedDuration, p.args.Aggregated.ScoreAggregationType)
} else {
nodeUsage = &nodeMetric.Status.NodeMetric.NodeUsage
}
if nodeUsage != nil {
for resourceName, quantity := range nodeUsage.ResourceList {
if q := estimatedPodActualUsages[resourceName]; !q.IsZero() {
quantity = quantity.DeepCopy()
if quantity.Cmp(q) >= 0 {
quantity.Sub(q)
}
}
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
}
}
}

Yes, I noticed this issue while making the changes. I will further modify this part.

@saintube
Copy link
Member

@zwForrest Thanks for your contributions! Please resolve the conflicts and we will continue the review.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

Attention: Patch coverage is 67.85714% with 18 lines in your changes missing coverage. Please review.

Project coverage is 68.62%. Comparing base (b208bdf) to head (f3be24c).
Report is 2 commits behind head on main.

Files Patch % Lines
pkg/scheduler/plugins/loadaware/load_aware.go 67.85% 11 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1992      +/-   ##
==========================================
+ Coverage   68.61%   68.62%   +0.01%     
==========================================
  Files         430      430              
  Lines       39668    39652      -16     
==========================================
- Hits        27218    27212       -6     
+ Misses      10095    10088       -7     
+ Partials     2355     2352       -3     
Flag Coverage Δ
unittests 68.62% <67.85%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zwForrest
Copy link
Contributor Author

zwForrest commented May 27, 2024

@zwForrest Thanks for your contributions! Please resolve the conflicts and we will continue the review.

Done, The ci error is caused by other code. @saintube

@saintube
Copy link
Member

saintube commented Jun 7, 2024

image
@zwForrest Please check your logic to pass the UTs.

Signed-off-by: zwForrest <756495135@qq.com>
@zwForrest
Copy link
Contributor Author

zwForrest commented Jun 11, 2024

@zwForrest Please check your logic to pass the UTs.

@saintube Done.

Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
PTAL /cc @ZiMengSheng

@ZiMengSheng
Copy link
Contributor

/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ZiMengSheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 0d57e7e into koordinator-sh:main Jun 12, 2024
22 checks passed
@zwForrest zwForrest deleted the feat/updateloadaware branch July 10, 2024 07:47
yangfeiyu20102011 pushed a commit to yangfeiyu20102011/koordinator that referenced this pull request Jul 29, 2024
…#1992)

Signed-off-by: zwForrest <756495135@qq.com>
Co-authored-by: zengwang1 <zengwang1@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants