-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] koordlet CgroupReconcile panics on mergePodResourceQoSForMemoryQoS #1670
Comments
@BlackPigHe Hi, did you have the LSE or SYSTEM QoS pods on the node? It is probably a bug fixed in #1556 and #1663. |
那我需要怎么做,基于最新的分支build镜像吗,我是用helm装的,已经是用的最新的啦 |
那我需要怎么做,基于最新的分支build镜像吗,我是用helm装的,已经是用的最新的啦 |
@BlackPigHe Bugfixes 还没有 release,所以使用修复版本可能需要基于最新分支 build 镜像。如果当前没有用到 MemoryQoS 特性的话,也可以通过配置 koordlet feature-gate 中 CgroupReconcile=false 来临时绕过问题。 |
老哥稳,回复得很及时,非常感谢,我尝试一下 |
@BlackPigHe Hi,请问修复方案验证的如何 |
|
What happened:
kubelet的容器一起在崩溃重启
I0919 17:59:08.568730 2319034 cpu_suppress.go:186] nodeSuppressBE[CPU(Core)]:6 = node.Total:8 * SLOPercent:65% - systemUsage:1 - podLSUsed:1
I0919 17:59:08.568746 2319034 predict_server.go:309] wait for the state to be synchronized, skipping the step of model GC
E0919 17:59:08.568778 2319034 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 332 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x20b9260?, 0x3ab9df0})
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0?})
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/runtime/runtime.go:49 +0x75
panic({0x20b9260, 0x3ab9df0})
/usr/local/go/src/runtime/panic.go:838 +0x207
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).mergePodResourceQoSForMemoryQoS(0x0?, 0xc0015b6800, 0x0)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:371 +0x39
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).getMergedPodResourceQoS(0x21c9d00?, 0xc0015b6800, 0xc00159ed60?)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:361 +0x90
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).calculateResources(0x1dbd291?, 0xc0014780c0, 0xc000244160?, {0xc000560000, 0x12, 0xc000244160?})
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:161 +0x4dd
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).calculateAndUpdateResources(0xc00090a540, 0xc000244160)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:131 +0xb5
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).reconcile(0xc00090a540)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:109 +0x52
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10000000001?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x28e48a0, 0xc001478090}, 0x1, 0xc000111740)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x1?, 0x44e665?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0xc000755da0?, 0x0?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:92 +0x25
created by github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).Run
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:92 +0xea
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1e0be19]
goroutine 332 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0?})
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/runtime/runtime.go:56 +0xd8
panic({0x20b9260, 0x3ab9df0})
/usr/local/go/src/runtime/panic.go:838 +0x207
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).mergePodResourceQoSForMemoryQoS(0x0?, 0xc0015b6800, 0x0)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:371 +0x39
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).getMergedPodResourceQoS(0x21c9d00?, 0xc0015b6800, 0xc00159ed60?)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:361 +0x90
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).calculateResources(0x1dbd291?, 0xc0014780c0, 0xc000244160?, {0xc000560000, 0x12, 0xc000244160?})
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:161 +0x4dd
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).calculateAndUpdateResources(0xc00090a540, 0xc000244160)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:131 +0xb5
github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).reconcile(0xc00090a540)
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:109 +0x52
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10000000001?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x28e48a0, 0xc001478090}, 0x1, 0xc000111740)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x1?, 0x44e665?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0xc000755da0?, 0x0?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.15/pkg/util/wait/wait.go:92 +0x25
created by github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile.(*cgroupResourcesReconcile).Run
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/cgreconcile/cgroup_reconcile.go:92 +0xea
What you expected to happen:
容器不能一直重启
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
koordinator 1.3
kubectl version
):1.18
默认参数
docker版本
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:42:53 2020
OS/Arch: linux/amd64
Experimental: false
Linux k8s-master0 3.10.0-1160.90.1.el7.x86_64 ✨ Add NodeMetric API #1 SMP Thu May 4 15:21:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: