-
Hello, I'm not very knowledgeable about the low level components of kubernetes and how they fit together and auth so please bear with me if I'm talking nonsense. On a talos linux home server (single node k8s cluster) I realized that things were down and upon inspection the error seems to be that kubelet is loop crashing with the following error:
Indeed, the symlink target is an empty file:
First, I'm not sure how it ended up like that really. Is there anything I can do to figure out? Second, is there a way to recover things? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi, at this moment the only way to get out of this case might be to wipe the worker node's EPHEMERAL partition with:
Please not this wipes the whole This will force the kubelet to re-join. The root cause is most probably failed write by the kubelet. Talos 1.7 plans to improve on the recovery procedure in this case. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot! I think some files related to mayastor are stored somewhere
in /var, although in the machine config I've set up a bind mount for that
dir, not sure if it'd still be lost.
Alternatively I was thinking to take out the node's disk and manually
create an apiserver client certificate and update the file. Do you think
that'd work as well?
…On Sun, 4 Feb 2024, 12:38 Andrey Smirnov, ***@***.***> wrote:
Hi, at this moment the only way to get out of this case might be to wipe
the worker node's EPHEMERAL partition with:
talosctl reset -n WORKER_NODE --system-labels-to-wipe=EPHEMERAL --reboot
Please not this wipes the whole /var partition.
This will force the kubelet to re-join.
The root cause is most probably failed write by the kubelet. Talos 1.7
plans to improve on the recovery procedure in this case.
—
Reply to this email directly, view it on GitHub
<#8248 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIWHC5RAVFYVHRMHIOPNH3YR5QKXAVCNFSM6AAAAABCXK2XMSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DGNJZHAYTK>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
Hi, at this moment the only way to get out of this case might be to wipe the worker node's EPHEMERAL partition with:
Please not this wipes the whole
/var
partition.This will force the kubelet to re-join.
The root cause is most probably failed write by the kubelet. Talos 1.7 plans to improve on the recovery procedure in this case.