Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime error: invalid memory address or nil pointer dereference #7184

Closed
karthickradhakrishnan opened this issue Oct 8, 2024 · 9 comments · Fixed by kubernetes-sigs/karpenter#1763
Assignees
Labels
bug Something isn't working triage/accepted Indicates that the issue has been accepted as a valid issue

Comments

@karthickradhakrishnan
Copy link

karthickradhakrishnan commented Oct 8, 2024

Description

Observed Behavior:
``{"level":"INFO","time":"2024-10-02T13:07:51.585Z","logger":"controller","message":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","commit":"5bdf9c3","controller":"disruption","namespace":"","name":"","reconcileID":"d9e09bca-0703-4b38-a2c7-1cedadcf58a4"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x208 pc=0x230a017]

goroutine 504 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile.func1()
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:111 +0x1e5
panic({0x277f360?, 0x4c9f9d0?})
runtime/panic.go:770 +0x132
sigs.k8s.io/karpenter/pkg/controllers/disruption.filterOutSameType(0xc016e72f08, {0xc0063d2fd8, 0x2, 0xc016b7f180?})
sigs.k8s.io/karpenter@v1.0.0/pkg/controllers/disruption/multinodeconsolidation.go:213 +0x5b7
sigs.k8s.io/karpenter/pkg/controllers/disruption.(MultiNodeConsolidation).firstNConsolidationOption(0xc000452150, {0x3476a98, 0xc006e8fdd0}, {0xc0063d2fd8, 0x3, 0x3}, 0x3)
sigs.k8s.io/karpenter@v1.0.0/pkg/controllers/disruption/multinodeconsolidation.go:147 +0x56f
sigs.k8s.io/karpenter/pkg/controllers/disruption.(MultiNodeConsolidation).ComputeCommand(0xc000452150, {0x3476a98, 0xc006e8fdd0}, 0xc0162b61b0, {0xc0041f0af0, 0x3, 0x9})
sigs.k8s.io/karpenter@v1.0.0/pkg/controllers/disruption/multinodeconsolidation.go:83 +0x430
sigs.k8s.io/karpenter/pkg/controllers/disruption.(Controller).disrupt(0xc000690100, {0x3476a98, 0xc006e8fdd0}, {0x3479920, 0xc000452150})
sigs.k8s.io/karpenter@v1.0.0/pkg/controllers/disruption/controller.go:167 +0x5e7
sigs.k8s.io/karpenter/pkg/controllers/disruption.(Controller).Reconcile(0xc000690100, {0x3476a98, 0xc006e8fda0})
sigs.k8s.io/karpenter@v1.0.0/pkg/controllers/disruption/controller.go:132 +0x405
sigs.k8s.io/karpenter/pkg/controllers/disruption.(Controller).Register.AsReconciler.func1({0x3476a98?, 0xc006e8fda0?}, {{{0x0?, 0x0?}, {0x2ca1bc8?, 0x5?}}})
github.com/awslabs/operatorpkg@v0.0.0-20240805231134-67d0acfb6306/singleton/controller.go:26 +0x2f
sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile(0xc006ea0280?, {0x3476a98?, 0xc006e8fda0?}, {{{0x0?, 0x5?}, {0x0?, 0xc003fe8d10?}}})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/reconcile/reconcile.go:113 +0x3d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile(0x347c588?, {0x3476a98?, 0xc006e8fda0?}, {{{0x0?, 0xb?}, {0x0?, 0x0?}}})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler(0xc000b10840, {0x3476ad0, 0xc0007565a0}, {0x29367a0, 0xc003f47360})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem(0xc000b10840, {0x3476ad0, 0xc0007565a0})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2()
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 479
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:218 +0x486``

Expected Behavior:

Karpenter pod not restarting

Reproduction Steps (Please include YAML):

Versions:

  • Chart Version: 1.0.0
  • Kubernetes Version (kubectl version): 1.30

We recently migrated to karpenter from cluster autoscaler for few of our accounts and started to see the above behaviour where the karpenter pod has been constantly restarting (getting into CrashLoopBackOff status but recovering by itself). We tried restarting the pod and rebooting the karpenter controller node if that helps but it didn't. The issue was however fixed when we updated the consolidationPolicy to WhenEmpty from WhenEmptyOrUnderUtilized . However this would increase the underlying cost. Also this is not for all the clusters which has WhenEmptyOrUnderUtilized but for few (where we have more number of nodes)

Nodepool configuration:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
labels:
app.kubernetes.io/managed-by: Helm
name: XXXX
spec:
disruption:
budgets:
- nodes: "1"
consolidateAfter: 5m
consolidationPolicy: WhenEmptyOrUnderutilized
limits:
cpu: 150
memory: 150Gi
template:
spec:
expireAfter: Never
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: main-XXXX-XXXXX
requirements:
- key: karpenter.k8s.aws/instance-category
minValues: 2
operator: In
values:
- r
- m
- c
- key: karpenter.k8s.aws/instance-family
minValues: 5
operator: Exists
- key: node.kubernetes.io/instance-type
minValues: 10
operator: Exists
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values:
- "4"
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- eu-central-1b
- eu-central-1c
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/os
operator: In
values:
- linux
terminationGracePeriod: 15m
weight: 10

@karthickradhakrishnan karthickradhakrishnan added bug Something isn't working needs-triage Issues that need to be triaged labels Oct 8, 2024
@rschalo
Copy link
Contributor

rschalo commented Oct 9, 2024

Hi thanks for reporting this issue, just to clarify, not all Karpenter deployments with node pools containing 'WhenEmptyOrUnderutilized' were crashlooping? Just ones with a high number of nodes in the cluster? Are there any logs from just before the panic?

@karthickradhakrishnan
Copy link
Author

karthickradhakrishnan commented Oct 10, 2024

@rschalo attaching the logs, as per the events and logs this issue happens when it tries to disrupt nodes and bring new ones
disruptionError.txt. We did try patching the karpenter version to 1.0.4 with kubernetes version as 1.30(patching done recently as well)

@karthickradhakrishnan
Copy link
Author

karthickradhakrishnan commented Oct 15, 2024

This restart stopped after we added a base limit (node:1 along with existing configuration) on disruption budget (node). However noticed that there are errors started from this function karpenter github. we assume if we have 8 nodes and the budget is 20% and nodes (candidates for disruption) is not ready for some reason causes this. Any help is highly appreciated

Also we tried patching karpenter to 1.0.4

@christianfeurer
Copy link

christianfeurer commented Oct 17, 2024

I observe the same issue in our EKS cluster. We've a lot of cronjobs that demand new nodes to be scheduled quite often. Average nodes in the nodepool is about 15-20.
Karpenter tries to keep up with to consolidate the ever changing cluster. It does it for a while until is ends up in a crashloop ~220 restarts in a day.

Configured a nodepool with both a 20% budget and a static budget of "5".

Running controller 1.0.4 with EKS @ 1.30

EDIT 18.10.24
My observation is, that before most of the memory issues there is multiple logs that signal a

Reconciler error

Additionally not each container restart solves the memory issue. Sometimes the container starts and immediately runs into a memory issue again. This takes several restarts until the container finally get's back to work.

{"level":"DEBUG","time":"2024-10-17T19:01:29.893Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"0f8788c","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"main-nodepool-nodepool-xhsdr"},"namespace":"","name":"main-nodepool-nodepool-xhsdr","reconcileID":"5908f16c-bd88-4c60-bdfd-60486e45e25b"}
{"level":"DEBUG","time":"2024-10-17T19:01:29.895Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"0f8788c","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"main-nodepool-nodepool-gpt69"},"namespace":"","name":"main-nodepool-nodepool-gpt69","reconcileID":"6703ed68-94f8-4745-bff5-d42010fd9b5b"}
{"level":"INFO","time":"2024-10-17T19:01:30.173Z","logger":"controller","caller":"runtime/panic.go:770","message":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","commit":"0f8788c","controller":"disruption","namespace":"","name":"","reconcileID":"996113e8-a79e-4333-ab53-477163842b6e"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x208 pc=0x230ca37]
goroutine 511 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:111 +0x1e5
panic({0x2782820?, 0x4ca50b0?})
runtime/panic.go:770 +0x132
sigs.k8s.io/karpenter/pkg/controllers/disruption.filterOutSameType(0xc00f42ea08, {0xc00c8d39c0, 0x2, 0xc00ef66780?})
sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:213 +0x5b7
sigs.k8s.io/karpenter/pkg/controllers/disruption.(*MultiNodeConsolidation).firstNConsolidationOption(0xc0004d2cb0, {0x347b2f8, 0xc00dbd8360}, {0xc00c8d39c0, 0x3, 0x7}, 0x3)
sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:147 +0x56f
sigs.k8s.io/karpenter/pkg/controllers/disruption.(*MultiNodeConsolidation).ComputeCommand(0xc0004d2cb0, {0x347b2f8, 0xc00dbd8360}, 0xc00ec5b5c0, {0xc00d197540, 0x7, 0x9})
sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:83 +0x430
sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).disrupt(0xc0008e2a80, {0x347b2f8, 0xc00dbd8360}, {0x347e180, 0xc0004d2cb0})
sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/controller.go:174 +0x5e7
sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).Reconcile(0xc0008e2a80, {0x347b2f8, 0xc00dbd8330})
sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/controller.go:136 +0x4ec
sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).Register.AsReconciler.func1({0x347b2f8?, 0xc00dbd8330?}, {{{0x0?, 0x0?}, {0x2ca58e8?, 0x5?}}})
github.com/awslabs/operatorpkg@v0.0.0-20240805231134-67d0acfb6306/singleton/controller.go:26 +0x2f
sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile(0xc00fb6d680?, {0x347b2f8?, 0xc00dbd8330?}, {{{0x0?, 0x5?}, {0x0?, 0xc00d00bd10?}}})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/reconcile/reconcile.go:113 +0x3d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x3480de8?, {0x347b2f8?, 0xc00dbd8330?}, {{{0x0?, 0xb?}, {0x0?, 0x0?}}})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000ae0840, {0x347b330, 0xc000778dc0}, {0x293a420, 0xc00fae0e20})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000ae0840, {0x347b330, 0xc000778dc0})
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 486
sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:218 +0x486

@chberger
Copy link

Sometimes, before the reconciler error occurs, there's also a TLS handshake error, which then continues to the crash loop.

2024-10-18T06:01:32.398Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:218   +0x486"
--
2024-10-18T06:02:07.391Z,"""i-0ee1800d46d103d6b""","""controller""","http:   TLS handshake error from X:60976: read tcp   X:8443->X:60976: read: connection reset by peer"
2024-10-18T06:02:07.391Z,"""i-0ee1800d46d103d6b""","""controller""","http:   TLS handshake error from X:38650: EOF"
2024-10-18T06:03:12.431Z,"""i-0ee1800d46d103d6b""","""controller""","Reconciler   error"
2024-10-18T06:03:12.431Z,"""i-0ee1800d46d103d6b""","""controller""","Reconciler   error"
2024-10-18T06:03:13.432Z,"""i-0ee1800d46d103d6b""","""controller""","Reconciler   error"
2024-10-18T06:19:41.028Z,"""i-0ee1800d46d103d6b""","""controller""","Reconciler   error"
2024-10-18T06:19:42.029Z,"""i-0ee1800d46d103d6b""","""controller""","Reconciler   error"
2024-10-18T06:19:47.031Z,"""i-0ee1800d46d103d6b""","""controller""","Reconciler   error"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","panic:   runtime error: invalid memory address or nil pointer dereference   [recovered]"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","panic:   runtime error: invalid memory address or nil pointer dereference"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","[signal   SIGSEGV: segmentation violation code=0x1 addr=0x208 pc=0x230ca37]"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","goroutine   496 [running]:"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:111   +0x1e5"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","panic({0x2782820?,   0x4ca50b0?})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","runtime/panic.go:770   +0x132"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.filterOutSameType(0xc00f9ad908,   {0xc018ed8540, 0x2, 0xc01174c580?})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:213   +0x5b7"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*MultiNodeConsolidation).firstNConsolidationOption(0xc000ac7110,   {0x347b2f8, 0xc017261bf0}, {0xc018ed8540, 0x3, 0x8}, 0x3)"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:147   +0x56f"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*MultiNodeConsolidation).ComputeCommand(0xc000ac7110,   {0x347b2f8, 0xc017261bf0}, 0xc018ef68a0, {0xc018eb64b0, 0x8, 0xa})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:83   +0x430"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).disrupt(0xc000a3d100,   {0x347b2f8, 0xc017261bf0}, {0x347e180, 0xc000ac7110})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/controller.go:174   +0x5e7"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).Reconcile(0xc000a3d100,   {0x347b2f8, 0xc017261bc0})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/controller.go:136   +0x4ec"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).Register.AsReconciler.func1({0x347b2f8?,   0xc017261bc0?}, {{{0x0?, 0x0?}, {0x2ca58e8?, 0x5?}}})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","github.com/awslabs/operatorpkg@v0.0.0-20240805231134-67d0acfb6306/singleton/controller.go:26   +0x2f"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile(0xc01706d280?,   {0x347b2f8?, 0xc017261bc0?}, {{{0x0?, 0x5?}, {0x0?, 0xc00a8c7d10?}}})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/reconcile/reconcile.go:113   +0x3d"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x3480de8?,   {0x347b2f8?, 0xc017261bc0?}, {{{0x0?, 0xb?}, {0x0?, 0x0?}}})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114   +0xb7"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003331e0,   {0x347b330, 0xc00017dcc0}, {0x293a420, 0xc017252e60})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311   +0x3bc"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003331e0,   {0x347b330, 0xc00017dcc0})"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261   +0x1be"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222   +0x79"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","created   by   sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2   in goroutine 471"
2024-10-18T06:20:52.073Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:218   +0x486"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","panic:   runtime error: invalid memory address or nil pointer dereference   [recovered]"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","panic:   runtime error: invalid memory address or nil pointer dereference"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","[signal   SIGSEGV: segmentation violation code=0x1 addr=0x208 pc=0x230ca37]"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","goroutine   515 [running]:"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:111   +0x1e5"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","panic({0x2782820?,   0x4ca50b0?})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","runtime/panic.go:770   +0x132"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.filterOutSameType(0xc0130ca788,   {0xc00fb99680, 0x2, 0xc0106c4c00?})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:213   +0x5b7"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*MultiNodeConsolidation).firstNConsolidationOption(0xc0005251f0,   {0x347b2f8, 0xc00e2b8030}, {0xc00fb99680, 0x3, 0xb}, 0x3)"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:147   +0x56f"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*MultiNodeConsolidation).ComputeCommand(0xc0005251f0,   {0x347b2f8, 0xc00e2b8030}, 0xc011cf3080, {0xc00faf3180, 0xb, 0xd})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/multinodeconsolidation.go:83   +0x430"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).disrupt(0xc000209400,   {0x347b2f8, 0xc00e2b8030}, {0x347e180, 0xc0005251f0})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/controller.go:174   +0x5e7"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).Reconcile(0xc000209400,   {0x347b2f8, 0xc00e2b8000})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter@v1.0.2/pkg/controllers/disruption/controller.go:136   +0x4ec"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/karpenter/pkg/controllers/disruption.(*Controller).Register.AsReconciler.func1({0x347b2f8?,   0xc00e2b8000?}, {{{0x0?, 0x0?}, {0x2ca58e8?, 0x5?}}})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","github.com/awslabs/operatorpkg@v0.0.0-20240805231134-67d0acfb6306/singleton/controller.go:26   +0x2f"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile(0xc00d786f80?,   {0x347b2f8?, 0xc00e2b8000?}, {{{0x0?, 0x5?}, {0x0?, 0xc001e09d10?}}})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/reconcile/reconcile.go:113   +0x3d"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x3480de8?,   {0x347b2f8?, 0xc00e2b8000?}, {{{0x0?, 0xb?}, {0x0?, 0x0?}}})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114   +0xb7"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000b1e840,   {0x347b330, 0xc00076bcc0}, {0x293a420, 0xc012ec0c40})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311   +0x3bc"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000b1e840,   {0x347b330, 0xc00076bcc0})"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261   +0x1be"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222   +0x79"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","created   by   sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2   in goroutine 490"
2024-10-18T06:22:17.425Z,"""i-0ee1800d46d103d6b""","""controller""","sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:218   +0x486"

@rschalo rschalo added triage/accepted Indicates that the issue has been accepted as a valid issue and removed needs-triage Issues that need to be triaged labels Oct 21, 2024
@christianfeurer
Copy link

Hey, saw the fix in kubernetes-sigs/karpenter#1763 I'm looking forward to try this out in our clusters. Any idea when this will be available?

@rschalo rschalo self-assigned this Oct 28, 2024
@rschalo
Copy link
Contributor

rschalo commented Oct 28, 2024

Hi @christianfeurer, yes you should be able to try the fix with this snapshot. It isn't recommended for production but should be sufficient to test if it fully addresses your issue.

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-c3c673351360878e01bbe13fc54eb07e0d7310e4" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

@christianfeurer
Copy link

Hey @rschalo,
I really appreciate your effort providing me the snapshot version. Sadly I'm only able to check it in a sandbox environment where I cannot reproduce the issue. In the actual cluster - having the issue - I'm limited to an internal registry only allowing access to the public oci://public.ecr.aws/karpenter/karpenter.
Hence I'll wait for the next release - no worries. I'm also interested in other fixes like fix: clarify state node logging (#1766) hence it's worth waiting.

@rschalo
Copy link
Contributor

rschalo commented Oct 28, 2024

Sounds good - we don't have a date for 1.1 just yet. Thanks for your patience!

Given the fix, marking this as closed. if this issue persists then please reopen or open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates that the issue has been accepted as a valid issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants