Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in proportion plugin #3155

Open
svileex opened this issue Oct 12, 2023 · 4 comments
Open

Panic in proportion plugin #3155

svileex opened this issue Oct 12, 2023 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@svileex
Copy link

svileex commented Oct 12, 2023

What happened:
The Proportion plugin panicked when I had queues with guaranteed resources and queues without guaranteed resources. The problem lies in resource division; we must subtract guaranteed resources before performing the remaining division among all queues.

I have already added a test for this behavior and fixed this bug.

~/go/volcano/pkg/scheduler/plugins (proporion-guarantee-fix*) » go test ./...                                                                                                                     svilex@svilexLin
?       volcano.sh/volcano/pkg/scheduler/plugins        [no test files]
ok      volcano.sh/volcano/pkg/scheduler/plugins/binpack        (cached)
ok      volcano.sh/volcano/pkg/scheduler/plugins/cdp    (cached)
?       volcano.sh/volcano/pkg/scheduler/plugins/conformance    [no test files]
ok      volcano.sh/volcano/pkg/scheduler/plugins/drf    (cached)
?       volcano.sh/volcano/pkg/scheduler/plugins/extender       [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/gang   [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/nodeorder      [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/numaaware      [no test files]
ok      volcano.sh/volcano/pkg/scheduler/plugins/numaaware/policy       (cached)
ok      volcano.sh/volcano/pkg/scheduler/plugins/numaaware/provider/cpumanager  (cached)
?       volcano.sh/volcano/pkg/scheduler/plugins/overcommit     [no test files]
ok      volcano.sh/volcano/pkg/scheduler/plugins/predicates     (cached)
?       volcano.sh/volcano/pkg/scheduler/plugins/priority       [no test files]
E1012 21:47:39.004831  214480 utils.go:43] init kubeclient in 4pdvgpu failed: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
--- FAIL: TestGuarantee (0.00s)
panic: resource is not sufficient to do operation: <cpu 8000.00, memory 8000.00, nvidia.com/gpu 0.00> sub <cpu 10000.00, memory 10000.00> [recovered]
        panic: resource is not sufficient to do operation: <cpu 8000.00, memory 8000.00, nvidia.com/gpu 0.00> sub <cpu 10000.00, memory 10000.00>

goroutine 54 [running]:
testing.tRunner.func1.2({0x278d4a0, 0xc000132b10})
        /usr/local/go/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1399 +0x39f
panic({0x278d4a0, 0xc000132b10})
        /usr/local/go/src/runtime/panic.go:884 +0x212
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x2f?, {0xc000426000?, 0xc000655230?})
        /home/svilex/go/volcano/pkg/scheduler/util/assert/assert.go:33 +0x174
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x2bdf02f?, 0xf7941d?}, {0xc000655230?, 0x2a41620?, 0x0?})
        /home/svilex/go/volcano/pkg/scheduler/util/assert/assert.go:43 +0x56
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc0000a67c0, 0xc0000a6800)
        /home/svilex/go/volcano/pkg/scheduler/api/resource_info.go:246 +0x9c
volcano.sh/volcano/pkg/scheduler/plugins/proportion.(*proportionPlugin).OnSessionOpen(0xc0000a6360, 0xc0005143c0)
        /home/svilex/go/volcano/pkg/scheduler/plugins/proportion/proportion.go:241 +0x15c5
volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x2e7dcf8?, 0xc0001ff900?}, {0xc000012348, 0x1, 0x1}, {0x0, 0x0, 0x0})
        /home/svilex/go/volcano/pkg/scheduler/framework/framework.go:45 +0x327
volcano.sh/volcano/pkg/scheduler/plugins/proportion.TestGuarantee(0xc000602b60)
        /home/svilex/go/volcano/pkg/scheduler/plugins/proportion/proportion_test.go:417 +0xfb8
testing.tRunner(0xc000602b60, 0x2c84658)
        /usr/local/go/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1493 +0x35f
FAIL    volcano.sh/volcano/pkg/scheduler/plugins/proportion     9.018s
?       volcano.sh/volcano/pkg/scheduler/plugins/rescheduling   [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/resourcequota  [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/sla    [no test files]
ok      volcano.sh/volcano/pkg/scheduler/plugins/task-topology  (cached)
ok      volcano.sh/volcano/pkg/scheduler/plugins/tdm    (cached)

?       volcano.sh/volcano/pkg/scheduler/plugins/usage  [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/util   [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/util/k8s       [no test files]
?       volcano.sh/volcano/pkg/scheduler/plugins/util/nodelock  [no test files]
FAIL

What you expected to happen:
I don't expect panic
How to reproduce it (as minimally and precisely as possible):
Create three queues, one with guarantee resources > allResourcesInCluster/3, and two other queues.
Then run job in each queue which takes all resources in this queue.

Anything else we need to know?:
I have already fixed this bug, and now I'm going to create a MR. You can then check how to reproduce this bug in the test from my MR

Environment:

  • Volcano Version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@svileex svileex added the kind/bug Categorizes issue or PR as related to a bug. label Oct 12, 2023
@svileex
Copy link
Author

svileex commented Oct 12, 2023

Mr: #3156

@lowang-bh
Copy link
Member

Is it same with #3127?

@svileex
Copy link
Author

svileex commented Oct 13, 2023

No, in my case, the guaranteed resource is less than allocatable resources in the cluster

@svileex
Copy link
Author

svileex commented Nov 9, 2023

Hey, there hasn't been any news since October. Did I forget to do something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants