Scheduler panics with errors like "resource is not sufficient to do operation:" #3301

bysph · 2024-01-16T09:21:48Z

What happened:
The scheduler paniced and restarted several times for about half an hour, then recovered, all without any manual intervention， except for some normal tasks running automatically and completing. The logs are as follows:

igdata2-k8s-bgp25g-music75.gy.ntes fit failed: plugin NodeAffinity predicates failed node(s) didn't match Pod's node affinity/selector
I0115 10:12:11.618831       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata-k8s-mix-music94.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata-k8s-mix-music94.gy.ntes fit failed: plugin NodeUnschedulable predicates failed node(s) were unschedulable
I0115 10:12:11.618829       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata-k8s-mix-music84.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata-k8s-mix-music84.gy.ntes fit failed: plugin NodeUnschedulable predicates failed node(s) were unschedulable
I0115 10:12:11.618838       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata-k8s-mix-music90.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata-k8s-mix-music90.gy.ntes fit failed: plugin NodeUnschedulable predicates failed node(s) were unschedulable
I0115 10:12:11.618846       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata2-k8s-bgp25g-music30.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata2-k8s-bgp25g-music30.gy.ntes fit failed: plugin NodeAffinity predicates failed node(s) didn't match Pod's node affinity/selector
I0115 10:12:11.618853       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata2-k8s-bgp25g-music57.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata2-k8s-bgp25g-music57.gy.ntes fit failed: Insufficient cpu
I0115 10:12:11.618875       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata-k8s-mix-music58.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata-k8s-mix-music58.gy.ntes fit failed: plugin NodeUnschedulable predicates failed node(s) were unschedulable
I0115 10:12:11.618903       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata-k8s-mix-music29.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata-k8s-mix-music29.gy.ntes fit failed: plugin NodeUnschedulable predicates failed node(s) were unschedulable
I0115 10:12:11.618902       1 predicate_helper.go:75] Predicates failed for task <spark/wizard-308703-exec-303> on node <bigdata2-k8s-bgp25g-music66.gy.ntes>: task spark/wizard-308703-exec-303 on node bigdata2-k8s-bgp25g-music66.gy.ntes fit failed: plugin NodeAffinity predicates failed node(s) didn't match Pod's node affinity/selector
E0115 10:12:11.618926       1 runtime.go:77] Observed a panic: resource is not sufficient to do operation: <cpu -2600.00, memory 96698806272.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, ephemeral-storage 1846138130432000.00> sub <cpu 0.00, memory 0.00>
goroutine 21922 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dbd8c0?, 0xc0885a0570})
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x200000003?})
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1dbd8c0, 0xc0885a0570})
        /usr/local/go/src/runtime/panic.go:884 +0x213
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0xaf?, {0xc0c15be000?, 0xc0a308dca8?})
        /go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x174
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x222bdaf?, 0x40dc08?}, {0xc0a308dca8?, 0x7ff56a9ed4c0?, 0xd0?})
        /go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x56
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc0c15a7980, 0xc0c00af520)
        /go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:246 +0x9c
volcano.sh/volcano/pkg/scheduler/api.(*NodeInfo).FutureIdle(0xc0c008c480)
        /go/src/volcano.sh/volcano/pkg/scheduler/api/node_info.go:96 +0x246
volcano.sh/volcano/pkg/scheduler/actions/allocate.(*Action).Execute.func1(0xc01e879600, 0xc0c008c480)
        /go/src/volcano.sh/volcano/pkg/scheduler/actions/allocate/allocate.go:101 +0x36
volcano.sh/volcano/pkg/scheduler/util.(*predicateHelper).PredicateNodes.func1(0x6?)
        /go/src/volcano.sh/volcano/pkg/scheduler/util/predicate_helper.go:74 +0x3ab
k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
        /go/pkg/mod/k8s.io/client-go@v0.27.2/util/workqueue/parallelizer.go:90 +0x106
created by k8s.io/client-go/util/workqueue.ParallelizeUntil
        /go/pkg/mod/k8s.io/client-go@v0.27.2/util/workqueue/parallelizer.go:76 +0x1ce
E0115 10:12:11.618952       1 runtime.go:77] Observed a panic: resource is not sufficient to do operation: <cpu -8510.00, memory 81269567488.00, ephemeral-storage 1924896325632000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00> sub <cpu 0.00, memory 0.00>
goroutine 21928 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dbd8c0?, 0xc0880da720})
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x450fd6?})
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1dbd8c0, 0xc0880da720})
        /usr/local/go/src/runtime/panic.go:884 +0x213
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0xaf?, {0xc0c11b4000?, 0xc0c1b35ca8?})
        /go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x174
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x222bdaf?, 0x40dc08?}, {0xc0c1b35ca8?, 0x7ff56a9e7250?, 0xd0?})
        /go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x56
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc0c118df40, 0xc0c00bc8a0)
        /go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:246 +0x9c
volcano.sh/volcano/pkg/scheduler/api.(*NodeInfo).FutureIdle(0xc0c008c540)
        /go/src/volcano.sh/volcano/pkg/scheduler/api/node_info.go:96 +0x246
volcano.sh/volcano/pkg/scheduler/actions/allocate.(*Action).Execute.func1(0xc01e879600, 0xc0c008c540)
        /go/src/volcano.sh/volcano/pkg/scheduler/actions/allocate/allocate.go:101 +0x36
volcano.sh/volcano/pkg/scheduler/util.(*predicateHelper).PredicateNodes.func1(0x9?)
        /go/src/volcano.sh/volcano/pkg/scheduler/util/predicate_helper.go:74 +0x3ab
k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
        /go/pkg/mod/k8s.io/client-go@v0.27.2/util/workqueue/parallelizer.go:90 +0x106
created by k8s.io/client-go/util/workqueue.ParallelizeUntil
        /go/pkg/mod/k8s.io/client-go@v0.27.2/util/workqueue/parallelizer.go:76 +0x1ce

earlier logs that may help:

I0115 10:12:07.316612       1 allocate.go:245] Binding Task <spark/wizard-308744-exec-7> to node <bigdata2-k8s-bgp25g-music58.gy.ntes>
I0115 10:12:07.316629       1 statement.go:266] After allocated Task <spark/wizard-308744-exec-7> to Node <bigdata2-k8s-bgp25g-music58.gy.ntes>: idle <cpu 36400.00, memory 227695308800.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, used <cpu 90600.00, memory 304705064960.00, ephemeral-storage 5368709120000.00>, releasing <cpu 0.00, memory 0.00>
I0115 10:12:07.331873       1 allocate.go:245] Binding Task <spark/wizard-308765-exec-5> to node <bigdata2-k8s-bgp25g-music58.gy.ntes>
I0115 10:12:07.331889       1 statement.go:266] After allocated Task <spark/wizard-308765-exec-5> to Node <bigdata2-k8s-bgp25g-music58.gy.ntes>: idle <cpu 33400.00, memory 218031632384.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, used <cpu 93600.00, memory 314368741376.00, ephemeral-storage 5368709120000.00>, releasing <cpu 0.00, memory 0.00>
I0115 10:12:07.335107       1 allocate.go:245] Binding Task <spark/wizard-308773-exec-2> to node <bigdata2-k8s-bgp25g-music58.gy.ntes>
I0115 10:12:07.335127       1 statement.go:266] After allocated Task <spark/wizard-308773-exec-2> to Node <bigdata2-k8s-bgp25g-music58.gy.ntes>: idle <cpu 30400.00, memory 208367955968.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, used <cpu 96600.00, memory 324032417792.00, ephemeral-storage 5368709120000.00>, releasing <cpu 0.00, memory 0.00>
I0115 10:12:07.335293       1 cache.go:307] Revert assumed volumes for task spark/wizard-308773-exec-2 on node bigdata2-k8s-bgp25g-music58.gy.ntes
E0115 10:12:07.335299       1 statement.go:393] Failed to allocate task: for node bigdata2-k8s-bgp25g-music58.gy.ntes resources [cpu] are not enough to put task <spark/wizard-308773-exec-2>, idle: cpu 2400.00, memory 113610240000.00, hugepages-2Mi 0.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00, req: cpu 3000.00, memory 9663676416.00
E0115 10:12:07.352554       1 node_info.go:393] "Idle resources turn into negative after allocated" nodeName="bigdata2-k8s-bgp25g-music58.gy.ntes" task="spark/wizard-308703-exec-287" resources=[cpu] idle="cpu -1600.00, memory 100322684928.00, hugepages-2Mi 0.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00" req="cpu 3000.00, memory 9663676416.00"
E0115 10:12:07.357392       1 node_info.go:393] "Idle resources turn into negative after allocated" nodeName="bigdata2-k8s-bgp25g-music58.gy.ntes" task="spark/wizard-308703-exec-299" resources=[cpu] idle="cpu -4600.00, memory 90659008512.00, hugepages-2Mi 0.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00" req="cpu 3000.00, memory 9663676416.00"
I0115 10:12:07.361560       1 allocate.go:245] Binding Task <spark/wizard-308824-exec-10> to node <bigdata2-k8s-bgp25g-music58.gy.ntes>
I0115 10:12:07.361578       1 statement.go:266] After allocated Task <spark/wizard-308824-exec-10> to Node <bigdata2-k8s-bgp25g-music58.gy.ntes>: idle <cpu 27400.00, memory 198704279552.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, used <cpu 99600.00, memory 333696094208.00, ephemeral-storage 5368709120000.00>, releasing <cpu 0.00, memory 0.00>
I0115 10:12:07.361698       1 cache.go:307] Revert assumed volumes for task spark/wizard-308824-exec-10 on node bigdata2-k8s-bgp25g-music58.gy.ntes
E0115 10:12:07.361703       1 statement.go:393] Failed to allocate task: for node bigdata2-k8s-bgp25g-music58.gy.ntes resources [cpu] are not enough to put task <spark/wizard-308824-exec-10>, idle: cpu -4600.00, memory 90659008512.00, hugepages-2Mi 0.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00, req: cpu 3000.00, memory 9663676416.00
E0115 10:12:07.363133       1 node_info.go:393] "Idle resources turn into negative after allocated" nodeName="bigdata2-k8s-bgp25g-music58.gy.ntes" task="spark/kyuubi-pandora-by-m-bi-soc-eventpage-view-stat-1d-by-foll-c31991fd-82a8-39e2-b0c1-91e516c0f535-driver" resources=[cpu] idle="cpu -5600.00, memory 87035129856.00, hugepages-2Mi 0.00, ephemeral-storage 1846138130432000.00, hugepages-1Gi 0.00" req="cpu 1000.00, memory 3623878656.00"

What you expected to happen:
The scheduler should remain stable without panic

How to reproduce it (as minimally and precisely as possible):
I have not find the reproduction conditions yet

Anything else we need to know?:
I found this PR makes a modification to allow "used" to become negative: https://github.com/volcano-sh/volcano/pull/3006/files. But I'm not sure if it's related to it, maybe we can discuss here to make this issue clearer.

Environment:

Volcano Version: 1.8.1
Kubernetes version (use kubectl version): 1.19.3

The text was updated successfully, but these errors were encountered:

bysph · 2024-01-16T09:23:52Z

@Monokaix Hello, I've come across this problem, but after reviewing the code, I haven't been able to identify the root cause. Can you offer any assistance? thx~

Monokaix · 2024-01-17T02:04:29Z

Yeah, the pr #3006 you mentioned has made a modification to allow node idle resources become negative. Can you search "Node out of sync" in volcano scheduler log? When tasks has specfied nodeName or node cpu allocatable decreased, this case might hanpped, so you can check that.

Monokaix · 2024-01-17T02:08:35Z

And you should find which node's idle cpu resource become negative and check what changes have happened to the node and
what tasks are assigned and removed on this node.

lowang-bh · 2024-01-17T03:09:05Z

func (ni *NodeInfo) allocateIdleResource(ti *TaskInfo) {
	ok, resources := ti.Resreq.LessEqualWithResourcesName(ni.Idle, Zero)
	if ok {
		ni.Idle.sub(ti.Resreq)
		return
	}

	ni.Idle.sub(ti.Resreq)
	klog.ErrorS(nil, "Idle resources turn into negative after allocated",
		"nodeName", ni.Name, "task", klog.KObj(ti.Pod), "resources", resources, "idle", ni.Idle.String(), "req", ti.Resreq.String())
}

how about add a recover from panic and set node idle to zero resource?

bysph · 2024-01-17T07:48:36Z

Yeah, the pr #3006 you mentioned has made a modification to allow node idle resources become negative. Can you search "Node out of sync" in volcano scheduler log? When tasks has specfied nodeName or node cpu allocatable decreased, this case might hanpped, so you can check that.

Yes, I can find the following logs, but the logs do not indicate that the node (bigdata2-k8s-bgp25g-music58.gy.ntes) caused a panic. Other than these two possibilities, is it possible that the issue was caused by Volcano and kube-scheduler being used simultaneously?

Monokaix · 2024-01-19T02:11:27Z

Yeah, the pr #3006 you mentioned has made a modification to allow node idle resources become negative. Can you search "Node out of sync" in volcano scheduler log? When tasks has specfied nodeName or node cpu allocatable decreased, this case might hanpped, so you can check that.

Yes, I can find the following logs, but the logs do not indicate that the node (bigdata2-k8s-bgp25g-music58.gy.ntes) caused a panic. Other than these two possibilities, is it possible that the issue was caused by Volcano and kube-scheduler being used simultaneously?

Yeah, you can check pods on that node and find after which pod scheduled the scheduler paiced and analysis the whole scheduling process of all pods on that node.

Monokaix · 2024-01-19T02:12:43Z

func (ni *NodeInfo) allocateIdleResource(ti *TaskInfo) {
	ok, resources := ti.Resreq.LessEqualWithResourcesName(ni.Idle, Zero)
	if ok {
		ni.Idle.sub(ti.Resreq)
		return
	}

	ni.Idle.sub(ti.Resreq)
	klog.ErrorS(nil, "Idle resources turn into negative after allocated",
		"nodeName", ni.Name, "task", klog.KObj(ti.Pod), "resources", resources, "idle", ni.Idle.String(), "req", ti.Resreq.String())
}

how about add a recover from panic and set node idle to zero resource?

This will get an inconsistent node cache, which is not applicable.

lowang-bh · 2024-01-19T07:03:20Z

But we should not panic in scheduler. The origin code let the node to be outofsync and will not schedule task.

Monokaix · 2024-01-23T03:36:45Z

But we should not panic in scheduler. The origin code let the node to be outofsync and will not schedule task.

The outofsync state has been removed by #2998 to deal with some scenarios, and we should not count on this and node or device plugin shoud recover resource report themselves.

rt849122 · 2024-02-20T03:27:12Z

But we should not panic in scheduler. The origin code let the node to be outofsync and will not schedule task.

The outofsync state has been removed by #2998 to deal with some scenarios, and we should not count on this and node or device plugin shoud recover resource report themselves.

In our setting the outofsync (i.e. allocated > idle) node is quite common, especially when the cluster is crowded. This is because GPU/CPU breaking down are common events in a large cluster. Given "outofsync" is a common and normal event, the scheduler should never panic in such a normal path.

As the most obvious solution, the scheduler should skip the node instead of panic.

bysph added the kind/bug Categorizes issue or PR as related to a bug. label Jan 16, 2024

bysph changed the title ~~Scheduler panics for with errors like "resource is not sufficient to do operation:"~~ Scheduler panics with errors like "resource is not sufficient to do operation:" Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler panics with errors like "resource is not sufficient to do operation:" #3301

Scheduler panics with errors like "resource is not sufficient to do operation:" #3301

bysph commented Jan 16, 2024 •

edited

Loading

bysph commented Jan 16, 2024

Monokaix commented Jan 17, 2024 •

edited

Loading

Monokaix commented Jan 17, 2024

lowang-bh commented Jan 17, 2024

bysph commented Jan 17, 2024 •

edited

Loading

Monokaix commented Jan 19, 2024

Monokaix commented Jan 19, 2024

lowang-bh commented Jan 19, 2024

Monokaix commented Jan 23, 2024 •

edited

Loading

rt849122 commented Feb 20, 2024

Scheduler panics with errors like "resource is not sufficient to do operation:" #3301

Scheduler panics with errors like "resource is not sufficient to do operation:" #3301

Comments

bysph commented Jan 16, 2024 • edited Loading

bysph commented Jan 16, 2024

Monokaix commented Jan 17, 2024 • edited Loading

Monokaix commented Jan 17, 2024

lowang-bh commented Jan 17, 2024

bysph commented Jan 17, 2024 • edited Loading

Monokaix commented Jan 19, 2024

Monokaix commented Jan 19, 2024

lowang-bh commented Jan 19, 2024

Monokaix commented Jan 23, 2024 • edited Loading

rt849122 commented Feb 20, 2024

bysph commented Jan 16, 2024 •

edited

Loading

Monokaix commented Jan 17, 2024 •

edited

Loading

bysph commented Jan 17, 2024 •

edited

Loading

Monokaix commented Jan 23, 2024 •

edited

Loading