Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

All jobs are pending, when some jobs set resources,others not set resources. #409

Closed
chenyangxueHDU opened this issue Oct 10, 2018 · 5 comments · Fixed by #433
Closed
Labels
area/policy kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Milestone

Comments

@chenyangxueHDU
Copy link
Contributor

I created 4 jobs which have same group-name, such as ps-0, ps-1, worker-0, worker-1. Only worker jobs had resources tag.

- apiVersion: batch/v1
  kind: Job
  metadata:
    name: cyx2-worker-0
        annotations:
          scheduling.k8s.io/group-name: cyx2
  spec:
   template:
     spec:
       containers:
       -  resources:
            limits:
              nvidia.com/gpu: "1"
            requests:
              cpu: "1"
              memory: 1Gi
- apiVersion: batch/v1
  kind: Job
  metadata:
    name: cyx2-ps-0
        annotations:
          scheduling.k8s.io/group-name: cyx2
  spec:
   template:
     spec:
       containers:

I found all jobs are pending. I checked log.

I1009 21:42:36.472045   21605 allocate.go:42] Enter Allocate ...
I1009 21:42:36.472224   21605 allocate.go:118] Binding Task <mind-automl/cyx2-worker-0-2mr7q> to node <192.168.47.52>
I1009 21:42:36.472399   21605 allocate.go:118] Binding Task <mind-automl/cyx2-worker-1-hdz8r> to node <192.168.47.52>
I1009 21:42:36.472426   21605 allocate.go:72] Queue <mind-automl> is overused, ignore it.
I1009 21:42:36.472431   21605 allocate.go:155] Leaving Allocate ..

I found queue is overuserd and only bind workers, but I did not set queue. I found the key in kube-batch\pkg\scheduler\plugins\proportion\proportion.go.

                        // Calculates the deserved of each Queue.
			attr.deserved.Add(remaining.Clone().Multi(float64(attr.weight) / float64(totalWeight)))
			if !attr.deserved.LessEqual(attr.request) {
				attr.deserved = helpers.Min(attr.deserved, attr.request)
				meet[attr.queueID] = struct{}{}
			}

These mean I can use resources what I request. Then I set ps jobs resources, all jobs are running. I think it should be reminded set resources in tutorial.

			if !attr.deserved.LessEqual(attr.request) {
				attr.deserved = helpers.Min(attr.deserved, attr.request)
				meet[attr.queueID] = struct{}{}
			}

Or we can change this.

@k82cn
Copy link
Contributor

k82cn commented Oct 10, 2018

We should fix it :)

In scheduler, we ignore BestEffort resource and leave it to ResourceQuota.

@k82cn k82cn added area/policy kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Oct 10, 2018
@k82cn k82cn added this to the v0.3 milestone Oct 10, 2018
@chenyangxueHDU
Copy link
Contributor Author

chenyangxueHDU commented Oct 10, 2018

We should fix it :)

In scheduler, we ignore BestEffort resource and leave it to ResourceQuota.

Yes. I think we should bind BestEffort tasks before other QoS tasks in kube-batch.

@k82cn
Copy link
Contributor

k82cn commented Oct 11, 2018

I think we should bind BestEffort tasks before other QoS tasks in kube-batch.

hm... , Yes, we need to handle BestEffort separately :) There're two options in my mind:

  1. Handle BestEffort after Burstable/Guarantee, so we do not need to re-schedule it when pod affinity/antiaffinity ready
  2. Handle BestEffort in another goroutines, but it will make pod-antiaffinity complex :(

Anyway, both need to support node priority to avoid dispatch all BestEffort to one node. Prefer to option 1 for now :)

@chenyangxueHDU
Copy link
Contributor Author

chenyangxueHDU commented Oct 11, 2018

I think we should bind BestEffort tasks before other QoS tasks in kube-batch.

hm... , Yes, we need to handle BestEffort separately :) There're two options in my mind:

  1. Handle BestEffort after Burstable/Guarantee, so we do not need to re-schedule it when pod affinity/antiaffinity ready
  2. Handle BestEffort in another goroutines, but it will make pod-antiaffinity complex :(

Anyway, both need to support node priority to avoid dispatch all BestEffort to one node. Prefer to option 1 for now :)

If we handle BestEffort after Burstable/Guarantee, we need to change overused logic. Because if we handle Burstable/Guarantee firstly, the queue must be overused.

To avoid this, I will handle BestEffort before Burstable/Guarantee. To do this, I think we can add compareQoS in taskOrderFn, like this:

// make BestEffort > Burstable/Guarantee
func compareQoS(l, r *v1.Pod) int {}

But it will ingore some cases of Priority, because I will add this before compare Priority in taskOrderFn

	taskOrderFn := func(l interface{}, r interface{}) int {
		lv := l.(*api.TaskInfo)
		rv := r.(*api.TaskInfo)
                 
                // compareQoS first, before compare Priority
		if res := compareQoS(lv.Pod, rv.Pod); res != 0 {
			return res
		}

		glog.V(3).Infof("Priority TaskOrder: <%v/%v> prority is %v, <%v/%v> priority is %v",
			lv.Namespace, lv.Name, lv.Priority, rv.Namespace, rv.Name, rv.Priority)

		if lv.Priority == rv.Priority {
			return 0
		}

		if lv.Priority > rv.Priority {
			return -1
		}

		return 1
	}

If you agree, I will make the PR for it.

@k82cn
Copy link
Contributor

k82cn commented Oct 11, 2018

If we handle BestEffort after Burstable/Guarantee, we need to change overused logic. Because if we handle Burstable/Guarantee firstly, the queue must be overused.

I'm thinking a new action, named backfill, to handle such case. We did not consider pod number right now, so we do not need to consider Queue's overused in backfill for BestEffort.

We may also use this action to reuse allocated but not bind resources because of gang-scheduling/coscheduling.

I'm ok to take your fix as a quick fix, as it may take some time on struct of Job for BestEffort :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/policy kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants