Preemption not working with proportion plugin when queue is full #1772

Robert-Christensen-visa · 2021-09-30T22:35:13Z

What happened:

Low-priority jobs will not be preempted by pending high-priority jobs when the proportion plugin is used and the queue is full.

What you expected to happen:

If a high-priority job is submitted to a queue and all resources are used, I would expect volcano to terminate a low-priority running job to make room for the high-priority job to run. If the resources are limited by the capacity of the cluster and high-priority jobs are pending it will preempt a low-priority job. When the resources are limited by the queue capability I expect the same behavior, but do not see it.

How to reproduce it (as minimally and precisely as possible):

volcano-scheduler.conf

actions: "enqueue, allocate, preempt, backfill"
tiers:
- plugins:
  - name: priority
- plugins:
  - name: predicates
  - name: proportion

I create a single queue and two priority classes:

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: prod-queue
spec:
  weight: 1
  reclaimable: True
  capability:
    cpu: 4000m
    memory: 4G
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 1

The queue is configured to be limited to 4CPU and 4G memory. I am running locally on a machine that has 12 CPUs and the queue limit is less than that. To recreate, it is important the queue capability is less than the cluster's total capability.

I submit enough jobs with low-priority to fill the capacity of the queue. After those jobs are running I submit several jobs with high-priority. The high-priority jobs will not preempt the low-priority jobs.

I run this code to submit the jobs to the queue, wait for several seconds, and run jobs with high-priority.

job_template_file=$(mktemp)
cat <<EOF > ${job_template_file}
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vcjob-job-<id>
spec:
  minAvailable: 0
  schedulerName: volcano
  priorityClassName: <priority>
  policies:
    - event: PodEvicted
      action: RestartTask
  maxRetry: 100
  queue: <queue>
  tasks:
    - replicas: 1
      name: "x"
      template:
        metadata:
          name: core
        spec:
          priorityClassName: <priority>
          terminationGracePeriodSeconds: 10
          containers:
            - image: ubuntu
              imagePullPolicy: IfNotPresent
              name: ubuntu
              command: ['sh', '-c', "sleep 600000"]
              resources:
                requests:
                  cpu: "1000m"
                  memory: "256Mi"
          restartPolicy: OnFailure
EOF

for id in $(seq 6)
do
    sed 's/<queue>/prod-queue/g' < ${job_template_file} | \
    sed "s/<id>/low-$id/g" | \
    sed "s/<priority>/low-priority/g" | \
    kubectl apply -f -
done

sleep 10

for id in $(seq 6)
do
    sed 's/<queue>/prod-queue/g' < ${job_template_file} | \
    sed "s/<id>/high-$id/g" | \
    sed "s/<priority>/high-priority/g" | \
    kubectl apply -f -
done

I wait for some time and continue to see low priority jobs and the other jobs pending.

$ kubectl get vcjob
NAME               AGE
vcjob-job-high-1   46s
vcjob-job-high-2   46s
vcjob-job-high-3   45s
vcjob-job-high-4   45s
vcjob-job-high-5   45s
vcjob-job-high-6   45s
vcjob-job-low-1    52s
vcjob-job-low-2    52s
vcjob-job-low-3    52s
vcjob-job-low-4    51s
vcjob-job-low-5    51s
vcjob-job-low-6    51s
$ kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
vcjob-job-low-1-x-0   1/1     Running   0          53s
vcjob-job-low-2-x-0   1/1     Running   0          52s
vcjob-job-low-3-x-0   1/1     Running   0          52s
vcjob-job-low-4-x-0   1/1     Running   0          52s

This bug means a high-priority job would not be able to preempt a job with low-priority if the queue is fully utilized.

If resources become available (e.g., the queue capacity increases or proportion plugin is disabled), the jobs with high-priority will start before the jobs with low-priority, which means job order is working. However, the expectation with preemption and priority is that if a jobs is high-priority it should start running quickly by clearing resources of jobs that are labeled with lower priority.

Anything else we need to know?:

A similar issue happens when limiting resources using Kubernetes Resource Quota. When a namespace fully utilizes the resources assigned using the Kuberntes resource quota no new pods will be created. Because no high-priority pods are created preemption does not happen (because preemption happens between a pending pod and running pod). For example, #1014 and #1345 are trying to resolve issues related to this.

The Yunikorn scheduler documentation states they recommend disabling Kubernetes Resource Quota because it causes issues with resource management

Environment:

Volcano Version: master branch (as of Sept 30, 2021), so it is at least v1.4.0.
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

- Cloud provider or hardware configuration: Running locally on a mac
- OS (e.g. from /etc/os-release):
- Kernel (e.g. `uname -a`):
- Install tools:
- Others:

The text was updated successfully, but these errors were encountered:

stale · 2021-12-29T23:42:42Z

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

Robert-Christensen-visa · 2022-01-03T15:38:22Z

This issue has not been addressed and still exists.

Thor-wl · 2022-01-04T01:14:18Z

/cc @hwdef Can you help for that?

hwdef · 2022-01-04T03:34:14Z

@Thor-wl
Ok,I will deal with this problem

hwdef · 2022-01-25T01:45:38Z

@Robert-Christensen-visa
I read the documentation of Preemption and Proportion and didn't find a design to delete low priority pods. Is there something I'm missing with this question?

Robert-Christensen-visa · 2022-01-25T15:34:36Z

@hwdef
What plugins should influence whether or not something should be preempted? Does if I would like having the proportion plugin enabled and I would like preemption to be enabled, this should be a feature request, not a bug report?

I am trying to figure out how to proceed. There is nothing written in the documentation, but I am not sure if it is because the documentation is sparse or because it was intentionally left out. Being able to preempt with the proportion plugin would be a useful feature for me that is not currently working.

hwdef · 2022-01-26T05:46:02Z

for example:
vcjob a,b,c

a running. b and c in queue.
By default, b is dequeued after a has finished running.
If preemptive strategy is used, set high priority to c. Then after a is completed, c will be dequeued first.
c doesn't kill a's pod, but preempts b

Robert-Christensen-visa · 2022-01-26T22:23:38Z

I guess my confusion comes from the overloaded term "preempt". If you are saying preemption means job c will jump ahead in the queue so it runs before b, it is true that works. But I don't think I need to enable the preemption action in Volcano to get that functionality. The point of this issue is that with and without the preemption action the result is the same, and I was under the assumption that proportion should take some action if preemption action is enabled.

If you are saying this is intentional and proportion does not terminate running lower-priority jobs under resource constraints (like drf), that is okay. I was thinking this was due to an oversight, not an intentional omission.

Thanks this has been helpful!

hwdef · 2022-01-27T09:59:25Z

@Robert-Christensen-visa
I think you are right, I'll do some experiments in detail and check the code, and I'll reply when I get an accurate result.

william-wang · 2022-03-11T00:57:59Z

@hwdef any progress for this issue?

hwdef · 2022-03-11T01:39:27Z

@william-wang I haven't made much progress here, but under another issue, someone gave a solution, which I haven't tested yet, and I don't know if this solution is universal

#2034 (comment)

stale · 2022-06-19T01:49:10Z

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale · 2022-09-08T23:40:31Z

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

Robert-Christensen-visa added the kind/bug Categorizes issue or PR as related to a bug. label Sep 30, 2021

kenoung mentioned this issue Nov 25, 2021

Cross-namespace preemption keeps repeating but low priority task keeps getting allocated before the high priority task #1855

Closed

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2021

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2022

Robert-Christensen-visa closed this as completed Jan 26, 2022

Thor-wl assigned hwdef Jan 27, 2022

hwdef mentioned this issue Feb 22, 2022

Preemption not working properly for high priority job #2034

Closed

Robert-Christensen-visa reopened this Mar 15, 2022

dontan001 mentioned this issue Mar 16, 2022

Preemption between Jobs within Queue - Not Working #2067

Closed

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2022

stale bot closed this as completed Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preemption not working with proportion plugin when queue is full #1772

Preemption not working with proportion plugin when queue is full #1772

Robert-Christensen-visa commented Sep 30, 2021

stale bot commented Dec 29, 2021

Robert-Christensen-visa commented Jan 3, 2022

Thor-wl commented Jan 4, 2022

hwdef commented Jan 4, 2022

hwdef commented Jan 25, 2022

Robert-Christensen-visa commented Jan 25, 2022

hwdef commented Jan 26, 2022

Robert-Christensen-visa commented Jan 26, 2022

hwdef commented Jan 27, 2022

william-wang commented Mar 11, 2022

hwdef commented Mar 11, 2022

stale bot commented Jun 19, 2022

stale bot commented Sep 8, 2022

Preemption not working with proportion plugin when queue is full #1772

Preemption not working with proportion plugin when queue is full #1772

Comments

Robert-Christensen-visa commented Sep 30, 2021

stale bot commented Dec 29, 2021

Robert-Christensen-visa commented Jan 3, 2022

Thor-wl commented Jan 4, 2022

hwdef commented Jan 4, 2022

hwdef commented Jan 25, 2022

Robert-Christensen-visa commented Jan 25, 2022

hwdef commented Jan 26, 2022

Robert-Christensen-visa commented Jan 26, 2022

hwdef commented Jan 27, 2022

william-wang commented Mar 11, 2022

hwdef commented Mar 11, 2022

stale bot commented Jun 19, 2022

stale bot commented Sep 8, 2022