Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volcano-scheduler panics if i create a queue with guarantee greater than allocatable resource #3127

Open
imliuda opened this issue Sep 15, 2023 · 2 comments · May be fixed by #3106
Open

volcano-scheduler panics if i create a queue with guarantee greater than allocatable resource #3127

imliuda opened this issue Sep 15, 2023 · 2 comments · May be fixed by #3106
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@imliuda
Copy link

imliuda commented Sep 15, 2023

What happened:

If i create a queue with guarantee greater than allocatable resource, the volcano scheduler panics and exits.

What you expected to happen:

Because there may be other schedulers consume cluster’s resource, i think volcano can't achieve true guarantee, so add a queue status or condition which can indicate that volcano can not fulfill the guarantee.

How to reproduce it (as minimally and precisely as possible):

Create a queue with very big guarantee value.

Anything else we need to know?:

Environment:

private cluster

** Volcano Version: **

API Version: v1alpha1
Version: latest
Git SHA: 82d4b85
Built At: 2023-09-15 00:33:17
Go Version: go1.20.1
Go OS/Arch: linux/amd64

  • Kubernetes version (use kubectl version):
    1.21.14

  • Cloud provider or hardware configuration:

  • OS (e.g. from /etc/os-release):

  • Kernel (e.g. uname -a):

  • Install tools:

  • Others:
    I0915 08:14:45.080385 1 scheduler.go:94] scheduler completes Initialization and start to run
    I0915 08:14:45.080789 1 device_info.go:77] into devices
    I0915 08:14:45.080975 1 device_info.go:77] into devices
    I0915 08:14:45.081086 1 device_info.go:77] into devices
    I0915 08:14:45.081242 1 device_info.go:77] into devices
    I0915 08:14:45.081487 1 cache.go:1172] There are <1> Jobs, <2> Queues and <4> Nodes in total for scheduling.
    I0915 08:14:45.081559 1 session.go:186] Open Session 67f4b8c3-01ab-4132-9919-a5a9709d46db with <1> Job and <2> Queues
    E0915 08:14:45.083388 1 runtime.go:77] Observed a panic: resource is not sufficient to do operation: <cpu 14000.00, memory 27105746944.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, ephemeral-storage 65660151292000.00> sub <cpu 20000.00, memory 0.00>
    goroutine 251 [running]:
    k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dbd800?, 0xc0007f3600})
    /go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
    k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0006072a0?})
    /go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
    panic({0x1dbd800, 0xc0007f3600})
    /usr/local/go/src/runtime/panic.go:884 +0x213
    volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x70?, {0xc0001da240?, 0xc000607410?})
    /go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x174
    volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x222ba70?, 0x1000040dc08?}, {0xc000607410?, 0x7fa0fe09da68?, 0xd0?})
    /go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x56
    volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc000854680, 0xc000854540)
    /go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:246 +0x9c
    volcano.sh/volcano/pkg/scheduler/plugins/proportion.(*proportionPlugin).OnSessionOpen(0xc000854560, 0xc00084a000)
    /go/src/volcano.sh/volcano/pkg/scheduler/plugins/proportion/proportion.go:120 +0x905
    volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x24d2fb8?, 0xc000154000?}, {0xc000194ed0, 0x2, 0x2}, {0x0, 0x0, 0x0})
    /go/src/volcano.sh/volcano/pkg/scheduler/framework/framework.go:45 +0x32b
    volcano.sh/volcano/pkg/scheduler.(*Scheduler).runOnce(0xc0002444d0)
    /go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:118 +0x265
    k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    /go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e
    k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x24aa500, 0xc0006f6f90}, 0x1, 0xc000114120)
    /go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6
    k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00053a7b0?, 0x3b9aca00, 0x0, 0x0?, 0x442b65?)
    /go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89
    k8s.io/apimachinery/pkg/util/wait.Until(0xa1a02a?, 0xc000134b80?, 0xc00053a7b8?)
    /go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25
    created by volcano.sh/volcano/pkg/scheduler.(*Scheduler).Run
    /go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:95 +0x19b
    panic: resource is not sufficient to do operation: <cpu 14000.00, memory 27105746944.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, ephemeral-storage 65660151292000.00> sub <cpu 20000.00, memory 0.00> [recovered]
    panic: resource is not sufficient to do operation: <cpu 14000.00, memory 27105746944.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, ephemeral-storage 65660151292000.00> sub <cpu 20000.00, memory 0.00>

goroutine 251 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0006072a0?})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x1dbd800, 0xc0007f3600})
/usr/local/go/src/runtime/panic.go:884 +0x213
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x70?, {0xc0001da240?, 0xc000607410?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x174
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x222ba70?, 0x1000040dc08?}, {0xc000607410?, 0x7fa0fe09da68?, 0xd0?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x56
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc000854680, 0xc000854540)
/go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:246 +0x9c
volcano.sh/volcano/pkg/scheduler/plugins/proportion.(*proportionPlugin).OnSessionOpen(0xc000854560, 0xc00084a000)
/go/src/volcano.sh/volcano/pkg/scheduler/plugins/proportion/proportion.go:120 +0x905
volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x24d2fb8?, 0xc000154000?}, {0xc000194ed0, 0x2, 0x2}, {0x0, 0x0, 0x0})
/go/src/volcano.sh/volcano/pkg/scheduler/framework/framework.go:45 +0x32b
volcano.sh/volcano/pkg/scheduler.(*Scheduler).runOnce(0xc0002444d0)
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:118 +0x265
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x24aa500, 0xc0006f6f90}, 0x1, 0xc000114120)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00053a7b0?, 0x3b9aca00, 0x0, 0x0?, 0x442b65?)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0xa1a02a?, 0xc000134b80?, 0xc00053a7b8?)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25
created by volcano.sh/volcano/pkg/scheduler.(*Scheduler).Run
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:95 +0x19b

@imliuda imliuda added the kind/bug Categorizes issue or PR as related to a bug. label Sep 15, 2023
@lowang-bh
Copy link
Member

/assign

@lowang-bh
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
2 participants