Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preempt action does not work well when gang is enabled. #3291

Closed
yuyue9284 opened this issue Jan 5, 2024 · 2 comments
Closed

Preempt action does not work well when gang is enabled. #3291

yuyue9284 opened this issue Jan 5, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@yuyue9284
Copy link

What happened:
Two nodes in the cluster, one is free, and the other is occupied by a low priority job. Submitting a high priority job requires 2 nodes, the high priority job keeps pending instead of preempting the low priority one.

What you expected to happen:
The low priority job should be preempted, and high priority job should run.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Both the preempt and allocate action contains similar logic to discard decision if job is not pipelined. In the case mentioned above, the high priority job will be discarded in the allocate action because there is only one available node, and in the preempt action, the empty node is not considered since there's no victim on that node, and after preempting the low priority job, the high priority job still missing one node to run, thus this decision is also discarded. Ideally in the preempt action, it is better to consider the empty node as well.

https://github.com/volcano-sh/volcano/blob/797e10c321550981dda5e765e652566a70a44254/pkg/scheduler/actions/preempt/preempt.go#L138C1-L144C5

https://github.com/volcano-sh/volcano/blob/master/pkg/scheduler/actions/allocate/allocate.go#L277-L284

                        // Commit changes only if job is pipelined, otherwise try next job.
                        if ssn.JobPipelined(preemptorJob) {
                                stmt.Commit()
                        } else {
                                stmt.Discard()
                                continue
                        }

Scheduler config:

    actions: "enqueue, allocate, preempt, backfill"
    tiers:
    - plugins:
      - name: priority
        enableJobStarving: false
      - name: sla 
        arguments:
          sla-waiting-time: 1m
        enableJobOrder: false
        enableJobPipelined: false 
      - name: conformance
    - plugins:
      - name: gang
      - name: drf
        enablePreemptable: false
    - plugins:
      - name: overcommit
        arguments:
          overcommit-factor: 100
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack

Environment:

  • Volcano Version: 1.7
  • Kubernetes version (use kubectl version):

    Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
    Kustomize Version: v5.0.1
    Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.6", GitCommit:"7ffcdf755d47c73903854cc5955afcdcd8c95225", GitTreeState:"clean", BuildDate:"2023-10-09T14:43:34Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@yuyue9284 yuyue9284 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2024
@Monokaix
Copy link
Member

Monokaix commented Jan 5, 2024

Hi,please use latest version and try another time: )
And also paste your job yaml and node info in detail.

@yuyue9284
Copy link
Author

yuyue9284 commented Jan 5, 2024

Hi,please use latest version and try another time: ) And also paste your job yaml and node info in detail.

Thanks, the latest version works.
Seems fixed by #2775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants