Support load aware scheduling #159

eahydra · 2022-05-19T16:46:56Z

Ⅰ. Describe what this PR does

The scheduling plugin filters abnormal nodes and scores them according
to resource usage. The plugin extends the Filter/Score/Reserve/Unreserve
extension points defined in the Kubernetes scheduling framework.

FYI: docs/proposals/scheduling/20220510-load-aware-scheduling.md

Ⅱ. Does this pull request fix one issue?

Fix #95

Ⅲ. Describe how to verify it

apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
  - args:
      apiVersion: kubescheduler.config.k8s.io/v1beta2
      kind: LoadAwareSchedulingArgs
      filterExpiredNodeMetrics: true
      nodeMetricExpirationSeconds: 300
      resourceWeights:
        cpu: 2
        memory: 1
      usageThresholds:
        cpu: 75
        memory: 85
      estimatedScalingFactors:
        cpu: 80
        memory: 70
    name: LoadAwareScheduling
  plugins:
    filter:
      enabled:
        - name: LoadAwareScheduling
          weight: 0
    reserve:
      enabled:
        - name: LoadAwareScheduling
          weight: 0
    score:
      enabled:
        - name: LoadAwareScheduling
          weight: 1000
  schedulerName: koord-scheduler

Ⅳ. Special notes for reviews

jasonliu747 · 2022-05-20T12:45:10Z

pkg/scheduler/OWNERS

@@ -1,3 +1,4 @@
 reviewers:
+  - eahydra


alphabetical order

hack/update-scheduler-codegen.sh

jasonliu747

haven't gone through the core logic of load aware plugin implementation yet. here is what I got for you so far ;)

cmd/koord-scheduler/app/options/options.go

cmd/koord-scheduler/app/server.go

pkg/scheduler/apis/config/doc.go

pkg/scheduler/apis/config/types_pluginargs.go

jasonliu747 · 2022-05-22T14:07:59Z

pkg/scheduler/apis/config/v1beta2/defaults.go

+)
+
+var (
+	defaultNodeMetricExpirationSeconds int64 = 180


can you add some comments to briefly tell us abouth the stroy behind these default values?

pkg/util/pod.go

pkg/scheduler/plugins/loadaware/pod_assign_cache.go

hormes · 2022-05-23T13:15:01Z

pkg/scheduler/plugins/loadaware/load_aware.go

+
+	var estimatedUsed int64
+	switch resourceName {
+	case corev1.ResourceCPU:


If it is a besteffort Pod？

The best-effort Pod uses extended resources and can be calculated directly using Value().

codecov-commenter · 2022-05-24T01:29:12Z

Codecov Report

Merging #159 (d57735e) into main (cb90038) will increase coverage by 0.38%.
The diff coverage is 71.02%.

❗ Current head d57735e differs from pull request most recent head 6cb8fdf. Consider uploading reports for the commit 6cb8fdf to get more accurate results

@@            Coverage Diff             @@
##             main     #159      +/-   ##
==========================================
+ Coverage   57.82%   58.21%   +0.38%     
==========================================
  Files          91       93       +2     
  Lines        8159     8404     +245     
==========================================
+ Hits         4718     4892     +174     
- Misses       3029     3081      +52     
- Partials      412      431      +19

Flag	Coverage Δ
unittests	`58.21% <71.02%> (+0.38%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/util/pod.go	`9.85% <0.00%> (-0.15%)`	⬇️
...kg/scheduler/plugins/loadaware/pod_assign_cache.go	`71.18% <71.18%> (ø)`
pkg/scheduler/plugins/loadaware/load_aware.go	`71.73% <71.73%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb90038...6cb8fdf. Read the comment docs.

eahydra · 2022-05-24T08:37:10Z

support user-defined node resource utilization thresholds.

saintube · 2022-05-24T11:14:46Z

pkg/scheduler/plugins/loadaware/load_aware.go

+func (p *Plugin) estimatedAssignedPodUsage(nodeName string, nodeMetric *slov1alpha1.NodeMetric) map[corev1.ResourceName]int64 {
+	estimatedUsed := make(map[corev1.ResourceName]int64)
+	nodeMetricReportInterval := getNodeMetricReportInterval(nodeMetric)
+	p.podAssignCache.lock.RLock()


Is it possible to reduce the lock granularity as there is no correlation among different nodes?

The granularity of fine-grained locks is not currently considered.

pkg/scheduler/plugins/loadaware/load_aware.go

The scheduling plugin filters abnormal nodes and scores them according to resource usage. The plugin extends the Filter/Score/Reserve/Unreserve extension points defined in the Kubernetes scheduling framework. FYI: docs/proposals/scheduling/20220510-load-aware-scheduling.md Fix koordinator-sh#95 Signed-off-by: Tao Li <joseph.t.lee@outlook.com>

saintube · 2022-05-26T02:19:58Z

/lgtm
others PTAL

hormes · 2022-05-26T02:21:27Z

/approve

koordinator-bot · 2022-05-26T02:21:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hormes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

koordinator-bot bot added the do-not-merge/work-in-progress label May 19, 2022

koordinator-bot bot requested review from allwmh, hormes, jasonliu747 and zwzhang0107 May 19, 2022 16:46

koordinator-bot bot added the size/XXL label May 19, 2022

eahydra force-pushed the support_load_aware_scheduling branch 15 times, most recently from cfcc279 to a6aece2 Compare May 20, 2022 08:41

eahydra marked this pull request as ready for review May 20, 2022 08:55

koordinator-bot bot removed the do-not-merge/work-in-progress label May 20, 2022

eahydra requested a review from saintube May 20, 2022 09:28

eahydra force-pushed the support_load_aware_scheduling branch from a6aece2 to a166f15 Compare May 20, 2022 09:35

jasonliu747 reviewed May 20, 2022

View reviewed changes

eahydra requested a review from jasonliu747 May 22, 2022 10:22

jasonliu747 reviewed May 22, 2022

View reviewed changes

jasonliu747 self-assigned this May 23, 2022

hormes reviewed May 23, 2022

View reviewed changes

eahydra force-pushed the support_load_aware_scheduling branch from a166f15 to 2e22a86 Compare May 24, 2022 01:28

eahydra force-pushed the support_load_aware_scheduling branch 2 times, most recently from 5817d17 to 637f78f Compare May 24, 2022 08:34

eahydra force-pushed the support_load_aware_scheduling branch from 637f78f to d57735e Compare May 24, 2022 08:39

eahydra requested review from hormes and jasonliu747 May 24, 2022 08:39

saintube reviewed May 24, 2022

View reviewed changes

eahydra force-pushed the support_load_aware_scheduling branch from d57735e to 6cb8fdf Compare May 25, 2022 13:07

eahydra requested a review from saintube May 25, 2022 13:38

koordinator-bot bot assigned saintube May 26, 2022

koordinator-bot bot added the lgtm label May 26, 2022

koordinator-bot bot added the approved label May 26, 2022

koordinator-bot bot merged commit 6ef657a into koordinator-sh:main May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support load aware scheduling #159

Support load aware scheduling #159

eahydra commented May 19, 2022 •

edited

Loading

jasonliu747 May 20, 2022

jasonliu747 left a comment

jasonliu747 May 22, 2022

hormes May 23, 2022

eahydra May 24, 2022

codecov-commenter commented May 24, 2022 •

edited

Loading

eahydra commented May 24, 2022

saintube May 24, 2022

eahydra May 25, 2022

saintube May 26, 2022

saintube commented May 26, 2022

hormes commented May 26, 2022

koordinator-bot bot commented May 26, 2022

Support load aware scheduling #159

Support load aware scheduling #159

Conversation

eahydra commented May 19, 2022 • edited Loading

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

jasonliu747 May 20, 2022

Choose a reason for hiding this comment

jasonliu747 left a comment

Choose a reason for hiding this comment

jasonliu747 May 22, 2022

Choose a reason for hiding this comment

hormes May 23, 2022

Choose a reason for hiding this comment

eahydra May 24, 2022

Choose a reason for hiding this comment

codecov-commenter commented May 24, 2022 • edited Loading

Codecov Report

eahydra commented May 24, 2022

saintube May 24, 2022

Choose a reason for hiding this comment

eahydra May 25, 2022

Choose a reason for hiding this comment

saintube May 26, 2022

Choose a reason for hiding this comment

saintube commented May 26, 2022

hormes commented May 26, 2022

koordinator-bot bot commented May 26, 2022

eahydra commented May 19, 2022 •

edited

Loading

codecov-commenter commented May 24, 2022 •

edited

Loading