Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelite keeps restart with "key failed with : missing content for serving cert" #4296

Closed
movsb opened this issue Nov 9, 2023 · 7 comments
Closed
Labels
kind/support Question with a workaround

Comments

@movsb
Copy link

movsb commented Nov 9, 2023

Summary

Pods periodically restart with Status Unknown → Running → Unknown. Kubelte Events says Pod SandBox changed, will recreate it. What I saw is that from the ps command output, kubelite crashes again and again.

Below are some journalctl -f -u snap.microk8s.daemon-kubelite.service logs:

Nov 09 18:31:12 nuc microk8s.daemon-kubelite[1529577]: E1109 18:31:12.884589 1529577 dynamic_serving_content.go:218] key failed with : missing content for serving cert "serving-cert::/var/snap/microk8s/6102/certs/server.crt::/var/snap/microk8s/6102/certs/server.key"
Nov 09 18:31:12 nuc microk8s.daemon-kubelite[1529577]: E1109 18:31:12.891749 1529577 dynamic_serving_content.go:218] key failed with : missing content for serving cert "aggregator-proxy-cert::/var/snap/microk8s/6102/certs/front-proxy-client.crt::/var/snap/microk8s/6102/certs/front-proxy-client.key"
Nov 09 18:31:13 nuc microk8s.daemon-kubelite[1529577]: I1109 18:31:13.255597 1529577 daemon.go:59] Stopping Kubelet
Nov 09 18:31:13 nuc microk8s.daemon-kubelite[1529577]: Stopping kubelite
Nov 09 18:31:13 nuc microk8s.daemon-kubelite[1529577]: I1109 18:31:13.255629 1529577 server.go:224] "Requested to terminate, exiting"
Nov 09 18:31:13 nuc systemd[1]: Stopping Service for snap application microk8s.daemon-kubelite...
Nov 09 18:31:13 nuc systemd[1]: snap.microk8s.daemon-kubelite.service: Deactivated successfully.
Nov 09 18:31:13 nuc systemd[1]: Stopped Service for snap application microk8s.daemon-kubelite.
Nov 09 18:31:13 nuc systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 7.217s CPU time.

Nov 09 18:31:16 nuc systemd[1]: Started Service for snap application microk8s.daemon-kubelite.
Nov 09 18:31:16 nuc microk8s.daemon-kubelite[1535374]: + export PATH=/snap/microk8s/6102/usr/sbin:/snap/microk8s/6102/usr/bin:/snap/microk8s/6102/sbin:/snap/microk8s/6102/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

If I cat those certs files, they do have contents.

I'm not pretty sure if this is the reason why kubelite keeps crashing and the Pods get Status Unknown.

What Should Happen Instead?

kubelite shouldn't crash.

Reproduction Steps

No precise way to reproduce.

Introspection Report

inspection-report-20231109_181304.tar.gz

Can you suggest a fix?

Are you interested in contributing with a fix?

yes.

@neoaggelos
Copy link
Contributor

Hi @movsb

I see the following line constantly appearing in the logs of the apiservice-kicker, which then restarts the Kubernetes services to get the new certificates (the error you experience happens while the certificates are being refreshed).

This typically happens when the networking configuration of the host is fluctuating. If there is nothing you can think about, can you check whether the following fixes your issues?

sudo touch /var/snap/microk8s/current/var/lock/no-cert-reissue

This would prevent MicroK8s from constantly refreshing the certificates, therefore avoid constantly restarting the K8s services.

@neoaggelos neoaggelos added the kind/support Question with a workaround label Nov 13, 2023
@movsb
Copy link
Author

movsb commented Nov 15, 2023

@neoaggelos Many many thanks! After two days of observation, microk8s never crashes again.

When you said that the networking configuration of my host is fluctuating, may I know what is that? Something like IP changing, or interface up and down?

And, what's the impact of no-cert-reissue?

@neoaggelos
Copy link
Contributor

Hi @movsb

Indeed, that could be an IP address that changes on the host. Hard to tell without more information.

MicroK8s watches for these changes to refresh the kube-apiserver certificates so that they include all IP addresses from the host. no-cert-reissue is a lock file that prevents MicroK8s from doing that.

@movsb
Copy link
Author

movsb commented Nov 15, 2023

Hi @neoaggelos ,

I found that almost all my interfaces' IP addresses (even those from Docker containers) are listed under [ alt_names ] of file certs/csr.conf. So if any one of them changed, produce_certs() will re-generate certs and then kick-restart kube-apiserver, right?

As a single-node cluster, should I just make the --bind-address option for kube-apiserver configured to stop the kicker from re-generating certs?

@neoaggelos
Copy link
Contributor

Hi @movsb, sorry for missing this.

As a single-node cluster, should I just make the --bind-address option for kube-apiserver configured to stop the kicker from re-generating certs?

This might become problematic if the node changes its IP address (e.g. DHCP gives out a different one after rebooting). The easiest way approach would be to create the no-cert-reissue lockfile:

sudo touch /var/snap/microk8s/current/var/lock/no-cert-reissue

@movsb
Copy link
Author

movsb commented Nov 17, 2023

@neoaggelos It's ok, no worry.

DHCP gives out a different one after rebooting

It'll be OK since this is a crucial node that I've already bound its MAC with a static IP address on my router DHCP.

I just wanted to let you know that I'm going to close this issue as solved. BTW, I didn't find this solution at https://microk8s.io/docs/troubleshooting. 😊

@movsb movsb closed this as completed Nov 17, 2023
@chirag-launchnodes
Copy link

chirag-launchnodes commented Oct 22, 2024

Unexpected MicroK8s Restart Despite Update and Certificate Lock Controls

We are using static IP for this single node microk8s node.
We recently experienced an unexpected MicroK8s restart that occurred without any clear trigger. We have observed two known patterns where MicroK8s restarts:
When Snap updates MicroK8s:
To resolve this, we have manually disabled automatic updates using the following command:

snap refresh --hold microk8s

When Snap attempts to renew MicroK8s certificates:
We have addressed this by preventing certificate renewals using the lock file as per abvoe, with the following command:

touch /var/snap/microk8s/current/var/lock/no-cert-reissue

However, despite these measures, we still experienced a restart today. Upon reviewing the logs, it appears that Snap attempted to renew the certificates, but the lock file successfully prevented this. Despite this, the restart still occurred. Below are relevant syslogs at the time of the incident

snapd[1595379]: storehelpers.go:923: cannot refresh: snap has no updates available: "certbot", "lxd", "microk8s", "snapd"
solo-staking-dev01 systemd[1]: Reloading.
systemd[1]: Configuration file /run/systemd/system/netplan-ovs-cleanup.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.|
systemd[1]: Configuration file /etc/systemd/system/snap.microk8s.daemon-kubelite.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
systemd[1]: Starting Daily apt download activities...
systemd[1]: Mounting Mount unit for core20, revision 2434...
kernel: [7741074.837352] loop1: detected capacity change from 0 to 130448
systemd[1]: Mounted Mount unit for core20, revision 2434.
Configuration file /etc/systemd/system/snap.microk8s.daemon-kubelite.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
microk8s.daemon-apiserver-kicker[966191]: /snap/microk8s/7231/actions/common/utils.sh: line 582: /snap/microk8s/7231/bin/sed: No such file or directory
microk8s.daemon-apiserver-kicker[966190]: /snap/microk8s/7231/actions/common/utils.sh: line 582: /snap/microk8s/7231/bin/hostname: No such file or directory
microk8s.daemon-apiserver-kicker[966194]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/bin/grep: No such file or directory
microk8s.daemon-apiserver-kicker[966196]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/cut: No such file or directory
microk8s.daemon-apiserver-kicker[966197]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/head: No such file or directory
microk8s.daemon-apiserver-kicker[966195]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/gawk: No such file or directory
microk8s.daemon-apiserver-kicker[966200]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/bin/grep: No such file or directory
microk8s.daemon-apiserver-kicker[966201]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/gawk: No such file or directory
microk8s.daemon-apiserver-kicker[966202]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/cut: No such file or directory
microk8s.daemon-apiserver-kicker[966203]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/head: No such file or directory
microk8s.daemon-apiserver-kicker[966206]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/bin/grep: No such file or directory

microk8s.daemon-apiserver-kicker[966207]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/gawk: No such file or directory
microk8s.daemon-apiserver-kicker[966208]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/cut: No such file or directory
microk8s.daemon-apiserver-kicker[966209]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/head: No such file or directory
microk8s.daemon-apiserver-kicker[966212]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/bin/grep: No such file or directory
microk8s.daemon-apiserver-kicker[966213]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/gawk: No such file or directory
microk8s.daemon-apiserver-kicker[966214]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/cut: No such file or directory
microk8s.daemon-apiserver-kicker[966215]: /snap/microk8s/7231/actions/common/utils.sh: line 586: /snap/microk8s/7231/usr/bin/head: No such file or directory
microk8s.daemon-apiserver-kicker[966216]: /snap/microk8s/7231/actions/common/utils.sh: line 800: /snap/microk8s/7231/bin/cp: No such file or directory
microk8s.daemon-apiserver-kicker[966217]: /snap/microk8s/7231/actions/common/utils.sh: line 815: /snap/microk8s/7231/bin/sed: No such file or directory
microk8s.daemon-apiserver-kicker[966218]: /snap/microk8s/7231/actions/common/utils.sh: line 788: /snap/microk8s/7231/bin/cp: No such file or directory
microk8s.daemon-apiserver-kicker[966220]: /snap/microk8s/7231/actions/common/utils.sh: line 603: /snap/microk8s/7231/usr/bin/openssl: No such file or directory
microk8s.daemon-apiserver-kicker[966221]: /snap/microk8s/7231/actions/common/utils.sh: line 604: /snap/microk8s/7231/usr/bin/openssl: No such file or directory
microk8s.daemon-apiserver-kicker[966224]: /snap/microk8s/7231/actions/common/utils.sh: line 608: /snap/microk8s/7231/usr/bin/openssl: No such file or directory
microk8s.daemon-apiserver-kicker[966225]: /snap/microk8s/7231/actions/common/utils.sh: line 608: /snap/microk8s/7231/bin/sed: No such file or directory
microk8s.daemon-apiserver-kicker[966226]: /snap/microk8s/7231/actions/common/utils.sh: line 609: /snap/microk8s/7231/usr/bin/openssl: No such file or directory
microk8s.daemon-apiserver-kicker[1597100]: cert change detected. Restarting the cluster-agent
systemd[1]: Stopping Service for snap application microk8s.daemon-cluster-agent...
systemd[1]: snap.microk8s.daemon-cluster-agent.service: Killing process 1597325 (cluster-agent) with signal SIGKILL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Question with a workaround
Projects
None yet
Development

No branches or pull requests

3 participants