CP shows Ready: false after reboot #9991

oliverl-21 · 2024-12-19T08:44:47Z

Bug Report

Description

I updated my Nodes to Talos 1.9.0 and Kubernetes 1.32.0. After rebooting a CP with talosctl the task gets stuck on:

◳ watching nodes: [rpi03]
    * rpi03: stage: RUNNING ready: false unmetCond: [name:"nodeReady" reason:"node \"rpi03\" status is not available yet"]

talosctl dashboard rpi03 also shows Ready: false while everything is working fine.
a kubectl get nodes shows the nodes as Status: Ready

Logs

k describe node rpi03

Events:
  Type     Reason                   Age                    From             Message
  ----     ------                   ----                   ----             -------
  Normal   Starting                 6m9s                   kube-proxy
  Normal   RegisteredNode           45m                    node-controller  Node rpi03 event: Registered Node rpi03 in Controller
  Normal   RegisteredNode           21m                    node-controller  Node rpi03 event: Registered Node rpi03 in Controller
  Normal   RegisteredNode           21m                    node-controller  Node rpi03 event: Registered Node rpi03 in Controller
  Normal   Shutdown                 9m7s                   kubelet          Shutdown manager detected shutdown event
  Normal   NodeNotReady             9m7s                   kubelet          Node rpi03 status is now: NodeNotReady
  Warning  InvalidDiskCapacity      6m24s                  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  6m24s                  kubelet          Updated Node Allocatable limit across pods
  Normal   Starting                 6m24s                  kubelet          Starting kubelet.
  Warning  Rebooted                 6m23s (x3 over 6m24s)  kubelet          Node rpi03 has been rebooted, boot id: 295c8af1-a7eb-4b9e-b308-d5dab0fbe475
  Normal   NodeHasSufficientMemory  6m23s (x4 over 6m24s)  kubelet          Node rpi03 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    6m23s (x4 over 6m24s)  kubelet          Node rpi03 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     6m23s (x4 over 6m24s)  kubelet          Node rpi03 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             6m23s (x3 over 6m24s)  kubelet          Node rpi03 status is now: NodeNotReady
  Normal   NodeReady                6m23s                  kubelet          Node rpi03 status is now: NodeReady

Environment

Talos version:

Client:
        Tag:         v1.9.0
        SHA:         3cb25ceb
        Built:
        Go version:  go1.23.4
        OS/Arch:     darwin/arm64
Server:
        NODE:        rpi04
        Tag:         v1.9.0
        SHA:         3cb25ceb
        Built:
        Go version:  go1.23.4
        OS/Arch:     linux/arm64
        Enabled:     RBAC

Kubernetes version:

Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0

Platform: RasPi 4

The text was updated successfully, but these errors were encountered:

smira · 2024-12-19T09:27:28Z

This not directly a Talos issue on its own, you need to look into the Node status to understand why it's not ready. This decision is made by the kubelet.

kubectl describe node gives you a detailed breakdown of all conditions.

See also #9984.

oliverl-21 · 2024-12-19T09:29:46Z

The Conditions look fine for me.

Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 19 Dec 2024 09:27:02 +0100   Thu, 19 Dec 2024 09:27:02 +0100   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 19 Dec 2024 10:28:57 +0100   Thu, 19 Dec 2024 09:26:49 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 19 Dec 2024 10:28:57 +0100   Thu, 19 Dec 2024 09:26:49 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 19 Dec 2024 10:28:57 +0100   Thu, 19 Dec 2024 09:26:49 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 19 Dec 2024 10:28:57 +0100   Thu, 19 Dec 2024 09:26:50 +0100   KubeletReady                 kubelet is posting ready status

smira · 2024-12-19T09:34:44Z

If your node is ready, there's no problem?

If there's a problem, please grab a talosctl support bundle and attach it to this ticket.

oliverl-21 · 2024-12-19T09:40:47Z

i guess its more of a cosmetic thing when Ready: false is shown in the dashboard

smira · 2024-12-19T09:52:36Z

Please see what I posted above, if you need help, add a support bundle.

oliverl-21 · 2024-12-19T10:45:27Z

it is somehow reporting as true now. Sorry for wasting your time

spagno · 2024-12-21T00:46:28Z

I have the same problem (just different architecture, I'm using x86-64), In attach the support.zip from one node
support.zip

smira · 2024-12-23T15:15:50Z

I have the same problem (just different architecture, I'm using x86-64), In attach the support.zip from one node support.zip

I don't see why it doesn't report Node status, even though everything seems to be fine.

spagno · 2024-12-26T21:20:28Z

I have the same problem (just different architecture, I'm using x86-64), In attach the support.zip from one node support.zip

I don't see why it doesn't report Node status, even though everything seems to be fine.

well, the VIP disappeared because in talos 1.8.4 the predictable interface name was enxMACADDRESS, in 1.9 the predictable name is "ensX"

but even now that the VIP is working, READY is still false with this config:

  discovery:
    enabled: true
    registries:
      kubernetes:
        disabled: true
      service:
        disabled: false

smira · 2024-12-27T09:21:24Z

but even now that the VIP is working, READY is still false with this config:

talosctl -n <NODE> get machinestatus -o yaml to see details, and then go from it to get the root cause

spagno · 2024-12-27T12:27:40Z

node: 192.168.68.11
metadata:
    namespace: runtime
    type: MachineStatuses.runtime.talos.dev
    id: machine
    version: 18
    owner: runtime.MachineStatusController
    phase: running
    created: 2024-12-27T00:08:21Z
    updated: 2024-12-27T00:08:29Z
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: nodeReady
              reason: node "k8s-control-1" status is not available yet

really confused. Which is the condition to get the readiness?

smira · 2024-12-27T12:49:33Z

I'm not sure why, but Talos can't pull the status of the node. That's what I posted above.

spagno · 2024-12-27T12:53:37Z

seems so. I also reinstalled the cluster from scratch (directly version 1.9.1) and no idea how to troubleshoot it. I mean, it's not a big issue because everything works but, you know, I'm curious to understand

azkbn · 2024-12-29T13:45:09Z

Same story on my side. Due to some reason status of my control plane is not available as well

metadata:
    namespace: runtime
    type: MachineStatuses.runtime.talos.dev
    id: machine
    version: 17
    owner: runtime.MachineStatusController
    phase: running
    created: 2024-12-29T13:29:09Z
    updated: 2024-12-29T13:30:42Z
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: nodeReady
              reason: node "cp01" status is not available yet

but cluster looks healthy:

NAME   STATUS   ROLES           AGE   VERSION
cp01   Ready    control-plane   18m   v1.32.0
w01    Ready    <none>          17m   v1.32.0

I'm using talos v1.9.0 + siderolabs/talos terraform provider 0.7.0

azkbn · 2024-12-29T19:20:17Z

Got some updates. Very weird case. I'm running talos in Proxmox. The issue started to happen when I decided to use different names for VM and hostname of nodes. Before they were always matching. But once I switched to more short names cp01 and w01 for k8s nodes (talos hostname), I faced with that issue. After rolling back those changes I finally got my nodes in Ready status on Talos dashboard. I guess it's very specific to my infra setup, but maybe will be helpful to someone else

spagno · 2024-12-30T10:56:45Z

Could be anything related to dns? is there any readiness check which uses PTR or A record to resolve controlplane's ip/hostname?

RealKelsar · 2025-01-02T15:00:54Z

I got the same Issue, but i allready have pretty short names example tp-n1
Cluster works fine so far, but talosctl commands for reboot and update won't ever finish without error, because they wait for ready.
support.zip
DNS resolves multiple variants of the names like: tp-n1 or tp-n1.turing

oliverl-21 closed this as completed Dec 19, 2024

sebiklamar mentioned this issue Dec 23, 2024

siderolabs/talos 0.7.0: CP shows "Ready: false" upon bootstrap sebiklamar/terraform-modules#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP shows Ready: false after reboot #9991

CP shows Ready: false after reboot #9991

oliverl-21 commented Dec 19, 2024

smira commented Dec 19, 2024

oliverl-21 commented Dec 19, 2024

smira commented Dec 19, 2024

oliverl-21 commented Dec 19, 2024

smira commented Dec 19, 2024

oliverl-21 commented Dec 19, 2024

spagno commented Dec 21, 2024

smira commented Dec 23, 2024

spagno commented Dec 26, 2024 •

edited

Loading

smira commented Dec 27, 2024

spagno commented Dec 27, 2024 •

edited

Loading

smira commented Dec 27, 2024

spagno commented Dec 27, 2024

azkbn commented Dec 29, 2024

azkbn commented Dec 29, 2024

spagno commented Dec 30, 2024

RealKelsar commented Jan 2, 2025

CP shows Ready: false after reboot #9991

CP shows Ready: false after reboot #9991

Comments

oliverl-21 commented Dec 19, 2024

Bug Report

Description

Logs

Environment

smira commented Dec 19, 2024

oliverl-21 commented Dec 19, 2024

smira commented Dec 19, 2024

oliverl-21 commented Dec 19, 2024

smira commented Dec 19, 2024

oliverl-21 commented Dec 19, 2024

spagno commented Dec 21, 2024

smira commented Dec 23, 2024

spagno commented Dec 26, 2024 • edited Loading

smira commented Dec 27, 2024

spagno commented Dec 27, 2024 • edited Loading

smira commented Dec 27, 2024

spagno commented Dec 27, 2024

azkbn commented Dec 29, 2024

azkbn commented Dec 29, 2024

spagno commented Dec 30, 2024

RealKelsar commented Jan 2, 2025

spagno commented Dec 26, 2024 •

edited

Loading

spagno commented Dec 27, 2024 •

edited

Loading