Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption not working with proportion plugin when queue is full #1772

Closed
Robert-Christensen-visa opened this issue Sep 30, 2021 · 13 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Robert-Christensen-visa

What happened:

Low-priority jobs will not be preempted by pending high-priority jobs when the proportion plugin is used and the queue is full.

What you expected to happen:

If a high-priority job is submitted to a queue and all resources are used, I would expect volcano to terminate a low-priority running job to make room for the high-priority job to run. If the resources are limited by the capacity of the cluster and high-priority jobs are pending it will preempt a low-priority job. When the resources are limited by the queue capability I expect the same behavior, but do not see it.

How to reproduce it (as minimally and precisely as possible):

volcano-scheduler.conf

actions: "enqueue, allocate, preempt, backfill"
tiers:
- plugins:
  - name: priority
- plugins:
  - name: predicates
  - name: proportion

I create a single queue and two priority classes:

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: prod-queue
spec:
  weight: 1
  reclaimable: True
  capability:
    cpu: 4000m
    memory: 4G
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 1

The queue is configured to be limited to 4CPU and 4G memory. I am running locally on a machine that has 12 CPUs and the queue limit is less than that. To recreate, it is important the queue capability is less than the cluster's total capability.

I submit enough jobs with low-priority to fill the capacity of the queue. After those jobs are running I submit several jobs with high-priority. The high-priority jobs will not preempt the low-priority jobs.

I run this code to submit the jobs to the queue, wait for several seconds, and run jobs with high-priority.

job_template_file=$(mktemp)
cat <<EOF > ${job_template_file}
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vcjob-job-<id>
spec:
  minAvailable: 0
  schedulerName: volcano
  priorityClassName: <priority>
  policies:
    - event: PodEvicted
      action: RestartTask
  maxRetry: 100
  queue: <queue>
  tasks:
    - replicas: 1
      name: "x"
      template:
        metadata:
          name: core
        spec:
          priorityClassName: <priority>
          terminationGracePeriodSeconds: 10
          containers:
            - image: ubuntu
              imagePullPolicy: IfNotPresent
              name: ubuntu
              command: ['sh', '-c', "sleep 600000"]
              resources:
                requests:
                  cpu: "1000m"
                  memory: "256Mi"
          restartPolicy: OnFailure
EOF

for id in $(seq 6)
do
    sed 's/<queue>/prod-queue/g' < ${job_template_file} | \
    sed "s/<id>/low-$id/g" | \
    sed "s/<priority>/low-priority/g" | \
    kubectl apply -f -
done

sleep 10

for id in $(seq 6)
do
    sed 's/<queue>/prod-queue/g' < ${job_template_file} | \
    sed "s/<id>/high-$id/g" | \
    sed "s/<priority>/high-priority/g" | \
    kubectl apply -f -
done

I wait for some time and continue to see low priority jobs and the other jobs pending.

$ kubectl get vcjob
NAME               AGE
vcjob-job-high-1   46s
vcjob-job-high-2   46s
vcjob-job-high-3   45s
vcjob-job-high-4   45s
vcjob-job-high-5   45s
vcjob-job-high-6   45s
vcjob-job-low-1    52s
vcjob-job-low-2    52s
vcjob-job-low-3    52s
vcjob-job-low-4    51s
vcjob-job-low-5    51s
vcjob-job-low-6    51s
$ kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
vcjob-job-low-1-x-0   1/1     Running   0          53s
vcjob-job-low-2-x-0   1/1     Running   0          52s
vcjob-job-low-3-x-0   1/1     Running   0          52s
vcjob-job-low-4-x-0   1/1     Running   0          52s

This bug means a high-priority job would not be able to preempt a job with low-priority if the queue is fully utilized.

If resources become available (e.g., the queue capacity increases or proportion plugin is disabled), the jobs with high-priority will start before the jobs with low-priority, which means job order is working. However, the expectation with preemption and priority is that if a jobs is high-priority it should start running quickly by clearing resources of jobs that are labeled with lower priority.

Anything else we need to know?:

A similar issue happens when limiting resources using Kubernetes Resource Quota. When a namespace fully utilizes the resources assigned using the Kuberntes resource quota no new pods will be created. Because no high-priority pods are created preemption does not happen (because preemption happens between a pending pod and running pod). For example, #1014 and #1345 are trying to resolve issues related to this.

The Yunikorn scheduler documentation states they recommend disabling Kubernetes Resource Quota because it causes issues with resource management

Environment:

  • Volcano Version: master branch (as of Sept 30, 2021), so it is at least v1.4.0.
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Running locally on a mac
- OS (e.g. from /etc/os-release):
- Kernel (e.g. `uname -a`):
- Install tools:
- Others:
@stale
Copy link

stale bot commented Dec 29, 2021

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2021
@Robert-Christensen-visa
Copy link
Author

This issue has not been addressed and still exists.

@stale stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2022
@Thor-wl
Copy link
Contributor

Thor-wl commented Jan 4, 2022

/cc @hwdef Can you help for that?

@hwdef
Copy link
Member

hwdef commented Jan 4, 2022

@Thor-wl
Ok,I will deal with this problem

@hwdef
Copy link
Member

hwdef commented Jan 25, 2022

@Robert-Christensen-visa
I read the documentation of Preemption and Proportion and didn't find a design to delete low priority pods. Is there something I'm missing with this question?

@Robert-Christensen-visa
Copy link
Author

@hwdef
What plugins should influence whether or not something should be preempted? Does if I would like having the proportion plugin enabled and I would like preemption to be enabled, this should be a feature request, not a bug report?

I am trying to figure out how to proceed. There is nothing written in the documentation, but I am not sure if it is because the documentation is sparse or because it was intentionally left out. Being able to preempt with the proportion plugin would be a useful feature for me that is not currently working.

@hwdef
Copy link
Member

hwdef commented Jan 26, 2022

for example:
vcjob a,b,c

a running. b and c in queue.
By default, b is dequeued after a has finished running.
If preemptive strategy is used, set high priority to c. Then after a is completed, c will be dequeued first.
c doesn't kill a's pod, but preempts b

@Robert-Christensen-visa
Copy link
Author

I guess my confusion comes from the overloaded term "preempt". If you are saying preemption means job c will jump ahead in the queue so it runs before b, it is true that works. But I don't think I need to enable the preemption action in Volcano to get that functionality. The point of this issue is that with and without the preemption action the result is the same, and I was under the assumption that proportion should take some action if preemption action is enabled.

If you are saying this is intentional and proportion does not terminate running lower-priority jobs under resource constraints (like drf), that is okay. I was thinking this was due to an oversight, not an intentional omission.

Thanks this has been helpful!

@hwdef
Copy link
Member

hwdef commented Jan 27, 2022

@Robert-Christensen-visa
I think you are right, I'll do some experiments in detail and check the code, and I'll reply when I get an accurate result.

@william-wang
Copy link
Member

@hwdef any progress for this issue?

@hwdef
Copy link
Member

hwdef commented Mar 11, 2022

@william-wang I haven't made much progress here, but under another issue, someone gave a solution, which I haven't tested yet, and I don't know if this solution is universal

#2034 (comment)

@stale
Copy link

stale bot commented Jun 19, 2022

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2022
@stale
Copy link

stale bot commented Sep 8, 2022

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants