Node not ready: container runtime is down #9984

tjwallace · 2024-12-17T23:21:19Z

Bug Report

Description

After upgrading to Talos 1.9.0 some of my nodes are never ready.

Logs

$ talosctl logs -n mystery containerd
mystery: {"level":"info","msg":"starting containerd","revision":"88aa2f531d6c2922003cc7929e51daf1c14caa0a","time":"2024-12-17T21:59:02.922801527Z","version":"v2.0.1"}
mystery: {"id":"io.containerd.image-verifier.v1.bindir","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.938434957Z","type":"io.containerd.image-verifier.v1"}
mystery: {"id":"io.containerd.warning.v1.deprecations","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.938616550Z","type":"io.containerd.warning.v1"}
mystery: {"id":"io.containerd.content.v1.content","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.938669420Z","type":"io.containerd.content.v1"}
mystery: {"id":"io.containerd.snapshotter.v1.native","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.938760243Z","type":"io.containerd.snapshotter.v1"}
mystery: {"id":"io.containerd.snapshotter.v1.overlayfs","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.938877570Z","type":"io.containerd.snapshotter.v1"}
mystery: {"id":"io.containerd.event.v1.exchange","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.939509134Z","type":"io.containerd.event.v1"}
mystery: {"id":"io.containerd.monitor.task.v1.cgroups","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.940531592Z","type":"io.containerd.monitor.task.v1"}
mystery: {"id":"io.containerd.metadata.v1.bolt","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.941341137Z","type":"io.containerd.metadata.v1"}
mystery: {"level":"info","msg":"metadata content store policy set","policy":"shared","time":"2024-12-17T21:59:02.941474037Z"}
mystery: {"id":"io.containerd.gc.v1.scheduler","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.944835061Z","type":"io.containerd.gc.v1"}
mystery: {"id":"io.containerd.differ.v1.walking","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.944938855Z","type":"io.containerd.differ.v1"}
mystery: {"id":"io.containerd.lease.v1.manager","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945028058Z","type":"io.containerd.lease.v1"}
mystery: {"id":"io.containerd.service.v1.containers-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945493666Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.service.v1.content-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945545162Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.service.v1.diff-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945572516Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.service.v1.images-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945599569Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.service.v1.introspection-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945629359Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.service.v1.namespaces-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945706696Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.service.v1.snapshots-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945762612Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.shim.v1.manager","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945789586Z","type":"io.containerd.shim.v1"}
mystery: {"id":"io.containerd.runtime.v2.task","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945817723Z","type":"io.containerd.runtime.v2"}
mystery: {"id":"io.containerd.service.v1.tasks-service","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.945986952Z","type":"io.containerd.service.v1"}
mystery: {"id":"io.containerd.grpc.v1.containers","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946076297Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.content","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946124777Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.diff","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946164550Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.events","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946240470Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.images","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946266250Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.introspection","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946355207Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.leases","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946421830Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.namespaces","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946448167Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.snapshots","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946479657Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.streaming.v1.manager","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946577011Z","type":"io.containerd.streaming.v1"}
mystery: {"id":"io.containerd.grpc.v1.streaming","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946649773Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.tasks","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946692184Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.transfer.v1.local","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946803084Z","type":"io.containerd.transfer.v1"}
mystery: {"id":"io.containerd.grpc.v1.transfer","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946880594Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.version","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946918504Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.ttrpc.v1.otelttrpc","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946943361Z","type":"io.containerd.ttrpc.v1"}
mystery: {"id":"io.containerd.grpc.v1.healthcheck","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.946972981Z","type":"io.containerd.grpc.v1"}
mystery: {"id":"io.containerd.podsandbox.controller.v1.podsandbox","level":"info","msg":"loading plugin","time":"2024-12-17T21:59:02.947000311Z","type":"io.containerd.podsandbox.controller.v1"}
mystery: {"error":"unable to init client for podsandbox: failed to get \"io.containerd.sandbox.store.v1\" plugin: no plugins registered for io.containerd.sandbox.store.v1: plugin: not found","id":"io.containerd.podsandbox.controller.v1.podsandbox","level":"warning","msg":"failed to load plugin","time":"2024-12-17T21:59:02.947230681Z","type":"io.containerd.podsandbox.controller.v1"}
mystery: {"address":"/system/run/containerd/containerd.sock.ttrpc","level":"info","msg":"serving...","time":"2024-12-17T21:59:02.948070189Z"}
mystery: {"address":"/system/run/containerd/containerd.sock","level":"info","msg":"serving...","time":"2024-12-17T21:59:02.948277202Z"}
mystery: {"level":"info","msg":"containerd successfully booted in 0.028911s","time":"2024-12-17T21:59:02.948326423Z"}
mystery: {"address":"unix:///run/containerd/s/814eeaf75962b52d1228c72642b2a1e34f6ad18299621a2bc0a0d887e86f9db6","level":"info","msg":"connecting to shim apid","namespace":"system","protocol":"ttrpc","time":"2024-12-17T21:59:14.675543337Z","version":3}

$ kubectl describe node ...
...
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 12 Jul 2024 13:30:08 -0700   Fri, 12 Jul 2024 13:30:08 -0700   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Tue, 17 Dec 2024 15:16:36 -0800   Tue, 17 Dec 2024 15:08:06 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 17 Dec 2024 15:16:36 -0800   Tue, 17 Dec 2024 15:08:06 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 17 Dec 2024 15:16:36 -0800   Tue, 17 Dec 2024 15:08:06 -0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                False   Tue, 17 Dec 2024 15:16:36 -0800   Tue, 17 Dec 2024 15:08:57 -0800   KubeletNotReady              container runtime is down

Environment

Talos version: [talosctl version --nodes <problematic nodes>]

$ talosctl version
Client:
	Tag:         v1.9.0
	SHA:         3cb25ceb
	Built:
	Go version:  go1.23.4
	OS/Arch:     darwin/arm64
Server:
	NODE:        192.168.10.10
	Tag:         v1.9.0
	SHA:         3cb25ceb
	Built:
	Go version:  go1.23.4
	OS/Arch:     linux/amd64
	Enabled:     RBAC
	NODE:        192.168.10.12
	Tag:         v1.9.0
	SHA:         3cb25ceb
	Built:
	Go version:  go1.23.4
	OS/Arch:     linux/amd64
	Enabled:     RBAC
	NODE:        192.168.10.11
	Tag:         v1.9.0
	SHA:         3cb25ceb
	Built:
	Go version:  go1.23.4
	OS/Arch:     linux/amd64
	Enabled:     RBAC
	NODE:        192.168.10.20
	Tag:         v1.9.0
	SHA:         3cb25ceb
	Built:
	Go version:  go1.23.4
	OS/Arch:     linux/amd64
	Enabled:     RBAC

Kubernetes version: Server Version: v1.32.0

The text was updated successfully, but these errors were encountered:

tjwallace · 2024-12-17T23:22:44Z

I was having the same problem in #9980 but I applied the suggested fix of using the Talos discovery service, and still have the same problems.

buroa · 2024-12-18T02:14:20Z

I am seeing this as well. I did not see it on the v1.9.0 betas.

smira · 2024-12-18T10:18:45Z

Please supply a talosctl support bundle.

You should be looking into talosctl logs cri.

Make sure if you have any contianerd/CRI config customizations, that they were update for containerd 2.0 configuration, but it should have failed on Talos 1.8 as well.

jalim · 2024-12-18T11:50:31Z

I seem to be experiencing the same issues, at the moment only affecting one of 4 machines, all installed on similar bare metal hardware. It seems to initially come online and report ready only to report not ready shortly thereafter. Have attached support zip if it helps.

support.zip

buroa · 2024-12-18T12:02:19Z

support.zip

Here is mine as well. It just happened after a reboot.

smira · 2024-12-18T12:51:55Z

Not sure what's going on there, but in both support files it happens around the time cephfs plugin is initialized.

buroa · 2024-12-18T14:05:57Z

@smira Not sure, but I doubt rook-ceph blew this up. Something changed between the beta and v1.9.0. I have git history and nothing has changed except upgrading to v1.9.0. That's when the problem occurred.

smira · 2024-12-18T14:09:37Z

You can see yourself in the logs.

We don't have any failures whatsoever in any of the tests, including Ceph. If there's a reproducer, happy to verify.

buroa · 2024-12-18T14:12:12Z

I read the same logs as you. It just so happens that cephfs is the last thing that comes up. Most likely a fluke, because again, nothing changed. Maybe there is a problem with upgrading from containerd 2.0.0 to 2.0.1.

smira · 2024-12-18T14:13:43Z

None of our tests showed any issues, I read Kubernetes source code. You might try to increase log verbosity of the kubelet with -v 9 to see what exactly goes wrong and why it moves into this state. Might be a Kubernetes bug as well.

In your log there's no clear reason on why kubelet considers CRI to be unhappy.

buroa · 2024-12-18T14:17:23Z

I changed kubelet to -v=9, here is the support.zip

support.zip

caycehouse · 2024-12-18T14:18:09Z

I’m facing the same issue. On Talos 1.8.4, CRI worked fine, but after upgrading to 1.9.0, 2 of my 3 nodes go ‘Ready’ for 30-60 seconds before switching to ‘Not Ready.’

smira · 2024-12-18T14:22:09Z

Which Kubernetes version is everyone having issues using?

buroa · 2024-12-18T14:23:57Z

@smira I was experiencing the issue on both v1.31.4 and v1.32.0. I was editing the machine config to go back and forth while debugging.

smira · 2024-12-18T14:35:32Z

Also, is everybody using Ceph?

tjwallace · 2024-12-18T20:18:39Z

I have already reverted my cluster to 1.8.4 but here is a support bundle before I reverted
support.zip

buroa · 2024-12-18T20:27:49Z

I can confirm this is due to containerd v2.0.1 and having multus-cni installed. I have a PR out on multus to hopefully fix it: k8snetworkplumbingwg/multus-cni#1371.

jonasled · 2024-12-18T20:42:02Z

I don't use multus on my cluster, but I have some nodes with the cephfs csi driver and some nodes without, all nodes with the driver had the problem, the control planes without cephfs worked flawlessly. On my cluster I switched all nodes back to 1.8.3, as it was the last version I had running successfully. ( I never had tried 1.8.4 )

siderolabs/talos#9984 containerd/go-cni#123

buroa · 2024-12-20T18:24:53Z

I filed this for containerd: containerd/containerd#11186

ekarlso · 2024-12-23T21:15:43Z

I got Cilium 1.16.5 and Talos 1.9.0

I get this happening without Multus when going from
K8s 1.30.3 -> 1.31.4

ekarlso · 2024-12-23T21:49:08Z

10.0.0.120: {"ts":1734987711195.6775,"caller":"rest/warnings.go:70","msg":"unknown field \"status.runtimeHandlers[1].features.userNamespaces\"","v":0}                                                                                                                                                                        
10.0.0.120: {"ts":1734987712058.0708,"caller":"kubelet/kubelet.go:2902","msg":"Container runtime network not ready","networkReady":"NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"}                                                                        
10.0.0.120: {"ts":1734987714647.7485,"caller":"cache/reflector.go:561","msg":"k8s.io/client-go/informers/factory.go:160: failed to list *v1.Service: \"spec.clusterIP\" is not a known field selector: only \"metadata.name\", \"metadata.namespace\"","v":0}                                                                 
10.0.0.120: {"ts":1734987714647.7979,"logger":"UnhandledError","caller":"cache/reflector.go:158","msg":"Unhandled Error","err":"k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Service: failed to list *v1.Service: \"spec.clusterIP\" is not a known field selector: only \"metadata.name\", \"metadata.names
pace\""}                                                                                                                                                                                                                                                                                                                      
10.0.0.120: {"ts":1734987714745.3572,"logger":"UnhandledError","caller":"kuberuntime/kuberuntime_manager.go:1274","msg":"Unhandled Error","err":"init container &Container{Name:config,Image:quay.io/cilium/cilium:v1.16.5@sha256:758ca0793f5995bb938a2fa219dcce63dc0b3fa7fc4ce5cc851125281fb7361d,Command:[cilium-dbg build-c
onfig],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:K8S_NODE_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:CILIUM_K8S_NAMESPACE,Value:,ValueFrom:&EnvVarSource{F
ieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:KUBERNETES_SERVICE_HOST,Value:127.0.0.1,ValueFrom:nil,},EnvVar{Name:KUBERNETES_SERVICE_PORT,Value:7445,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceLis
t{},Requests:ResourceList{},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:tmp,ReadOnly:false,MountPath:/tmp,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:kube-api-access-gs84d,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,
MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,Vol
umeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod cilium-qfdl6_kube-system(216e7cce-1d3f-4f1b-a004-c260136b4043): CreateContainerConfigError: services have not yet been read at least once, cannot construct envvars"}                             
10.0.0.120: {"ts":1734987714745.553,"logger":"UnhandledError","caller":"kuberuntime/kuberuntime_manager.go:1274","msg":"Unhandled Error","err":"container &Container{Name:cilium-envoy,Image:quay.io/cilium/cilium-envoy:v1.30.8-1733837904-eaae5aca0fb988583e5617170a65ac5aa51c0aa8@sha256:709c08ade3d17d52da4ca2af33f431360e
c26268d288d9a6cd1d98acc9a1dced,Command:[/usr/bin/cilium-envoy-starter],Args:[-- -c /var/run/cilium/envoy/bootstrap-config.json --base-id 0 --log-level info --log-format [%Y-%m-%d %T.%e][%t][%l][%n] [%g:%#] %v],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:envoy-metrics,HostPort:9964,ContainerPort:9964,Protocol
:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:K8S_NODE_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:CILIUM_K8S_NAMESPACE,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVe
rsion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:KUBERNETES_SERVICE_HOST,Value:127.0.0.1,ValueFrom:nil,},EnvVar{Name:KUBERNETES_SERVICE_PORT,Value:7445,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},Claims
:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:envoy-sockets,ReadOnly:false,MountPath:/var/run/cilium/envoy/sockets,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:envoy-artifacts,ReadOnly:true,MountPath:/var/run/cilium/envoy/artifacts,SubPath:,MountPropagation:ni
l,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:envoy-config,ReadOnly:true,MountPath:/var/run/cilium/envoy/,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:bpf-maps,ReadOnly:false,MountPath:/sys/fs/bpf,SubPath:,MountPropagation:*HostToContainer,SubPathExpr:,RecursiveReadOnl
y:nil,},VolumeMount{Name:kube-api-access-mn472,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:{0 9878 },Host:127.0.0.1,Scheme:HT
TP,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:5,PeriodSeconds:30,SuccessThreshold:1,FailureThreshold:10,TerminationGracePeriodSeconds:nil,},ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:{0 9878 },Host:127.0.0.1,Scheme
:HTTP,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:5,PeriodSeconds:30,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabili
ties:&Capabilities{Add:[NET_ADMIN SYS_ADMIN],Drop:[ALL],},Privileged:nil,SELinuxOptions:&SELinuxOptions{User:,Role:,Type:spc_t,Level:s0,},RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,AppArmorProfile:nil,},Stdi
n:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,VolumeDevices:[]VolumeDevice{},StartupProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:{0 9878 },Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRP
C:nil,},InitialDelaySeconds:5,TimeoutSeconds:1,PeriodSeconds:2,SuccessThreshold:1,FailureThreshold:105,TerminationGracePeriodSeconds:nil,},ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod cilium-envoy-nx7tr_kube-system(def0aa56-ecf9-4c4e-82cf-fc85710d29fa): CreateContainerConfigError: se
rvices have not yet been read at least once, cannot construct envvars"}                                                                                                                                                                                                                                                       
10.0.0.120: {"ts":1734987714746.8972,"caller":"kubelet/pod_workers.go:1301","msg":"Error syncing pod, skipping","pod":{"name":"cilium-envoy-nx7tr","namespace":"kube-system"},"podUID":"def0aa56-ecf9-4c4e-82cf-fc85710d29fa","err":"failed to \"StartContainer\" for \"cilium-envoy\" with CreateContainerConfigError: \"serv
ices have not yet been read at least once, cannot construct envvars\"","errCauses":[{"error":"failed to \"StartContainer\" for \"cilium-envoy\" with CreateContainerConfigError: \"services have not yet been read at least once, cannot construct envvars\""}]}                                                              
10.0.0.120: {"ts":1734987714746.9038,"caller":"kubelet/pod_workers.go:1301","msg":"Error syncing pod, skipping","pod":{"name":"cilium-qfdl6","namespace":"kube-system"},"podUID":"216e7cce-1d3f-4f1b-a004-c260136b4043","err":"failed to \"StartContainer\" for \"config\" with CreateContainerConfigError: \"services have no
t yet been read at least once, cannot construct envvars\"","errCauses":[{"error":"failed to \"StartContainer\" for \"config\" with CreateContainerConfigError: \"services have not yet been read at least once, cannot construct envvars\""}]}                                                                                
10.0.0.120: {"ts":1734987717060.077,"caller":"kubelet/kubelet.go:2902","msg":"Container runtime network not ready","networkReady":"NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"}      ```

smira · 2024-12-24T10:06:13Z

I think @buroa found the root cause (most probably): containerd/go-cni#123 (comment)

We'll patch containerd for v1.9.1

This bug is hard to reproduce

Fixes siderolabs/talos#9984 Patch with containerd/go-cni#126 See also: * containerd/go-cni#125 * containerd/containerd#11186 * containerd/go-cni#123 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit 0b00e86)

ishioni mentioned this issue Dec 18, 2024

feat(container): update image ghcr.io/siderolabs/installer ( v1.8.4 → v1.9.0 ) ishioni/homelab-ops#2964

Merged

1 task

smira mentioned this issue Dec 19, 2024

CP shows Ready: false after reboot #9991

Closed

buroa added a commit to buroa/k8s-gitops that referenced this issue Dec 19, 2024

fix(talos): downgrade containerd because its broke

ddc3fd4

siderolabs/talos#9984 containerd/go-cni#123

smira assigned frezbo Dec 20, 2024

smira mentioned this issue Dec 24, 2024

fix: patch containerd with CNI deadlock fix siderolabs/pkgs#1128

Merged

talos-bot closed this as completed in siderolabs/pkgs@0b00e86 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node not ready: container runtime is down #9984

Node not ready: container runtime is down #9984

tjwallace commented Dec 17, 2024

tjwallace commented Dec 17, 2024

buroa commented Dec 18, 2024 •

edited

Loading

smira commented Dec 18, 2024

jalim commented Dec 18, 2024

buroa commented Dec 18, 2024

smira commented Dec 18, 2024

buroa commented Dec 18, 2024 •

edited

Loading

smira commented Dec 18, 2024

buroa commented Dec 18, 2024

smira commented Dec 18, 2024

buroa commented Dec 18, 2024

caycehouse commented Dec 18, 2024 •

edited

Loading

smira commented Dec 18, 2024 •

edited

Loading

buroa commented Dec 18, 2024

smira commented Dec 18, 2024

tjwallace commented Dec 18, 2024

buroa commented Dec 18, 2024 •

edited

Loading

jonasled commented Dec 18, 2024

buroa commented Dec 20, 2024

ekarlso commented Dec 23, 2024

ekarlso commented Dec 23, 2024

smira commented Dec 24, 2024

Node not ready: container runtime is down #9984

Node not ready: container runtime is down #9984

Comments

tjwallace commented Dec 17, 2024

Bug Report

Description

Logs

Environment

tjwallace commented Dec 17, 2024

buroa commented Dec 18, 2024 • edited Loading

smira commented Dec 18, 2024

jalim commented Dec 18, 2024

buroa commented Dec 18, 2024

smira commented Dec 18, 2024

buroa commented Dec 18, 2024 • edited Loading

smira commented Dec 18, 2024

buroa commented Dec 18, 2024

smira commented Dec 18, 2024

buroa commented Dec 18, 2024

caycehouse commented Dec 18, 2024 • edited Loading

smira commented Dec 18, 2024 • edited Loading

buroa commented Dec 18, 2024

smira commented Dec 18, 2024

tjwallace commented Dec 18, 2024

buroa commented Dec 18, 2024 • edited Loading

jonasled commented Dec 18, 2024

buroa commented Dec 20, 2024

ekarlso commented Dec 23, 2024

ekarlso commented Dec 23, 2024

smira commented Dec 24, 2024

buroa commented Dec 18, 2024 •

edited

Loading

buroa commented Dec 18, 2024 •

edited

Loading

caycehouse commented Dec 18, 2024 •

edited

Loading

smira commented Dec 18, 2024 •

edited

Loading

buroa commented Dec 18, 2024 •

edited

Loading