Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 k3s fails to start on multiple distros #2125

Closed
mauromorales opened this issue Jan 10, 2024 · 5 comments · Fixed by kairos-io/provider-kairos#519
Closed

🐛 k3s fails to start on multiple distros #2125

mauromorales opened this issue Jan 10, 2024 · 5 comments · Fixed by kairos-io/provider-kairos#519
Assignees
Labels
bug Something isn't working

Comments

@mauromorales
Copy link
Member

[root@fedora kairos]# cat /etc/os-release 
NAME="Fedora Linux"
VERSION="38 (Container Image)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Container Image)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Container Image"
VARIANT_ID=container
KAIROS_VARIANT="standard"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_VERSION="v2.5.0-rc1-v1.29.0-k3s1"
KAIROS_IMAGE_LABEL="38-standard-amd64-generic-v2.5.0-rc1-k3sv1.29.0-k3s1"
KAIROS_TARGETARCH="amd64"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_NAME="kairos-standard-fedora-38"
KAIROS_ID_LIKE="kairos-standard-fedora-38"
KAIROS_PRETTY_NAME="kairos-standard-fedora-38 v2.5.0-rc1-v1.29.0-k3s1"
KAIROS_IMAGE_REPO="quay.io/kairos/fedora:38-standard-amd64-generic-v2.5.0-rc1-k3sv1.29.0-k3s1"
KAIROS_FLAVOR_RELEASE="38"
KAIROS_RELEASE="v2.5.0-rc1"
KAIROS_SOFTWARE_VERSION="v1.29.0+k3s1"
KAIROS_ID="kairos"
KAIROS_VERSION_ID="v2.5.0-rc1-v1.29.0-k3s1"
KAIROS_ARTIFACT="kairos-fedora-38-standard-amd64-generic-v2.5.0-rc1-k3sv1.29.0+k3s1"
KAIROS_FLAVOR="fedora"
KAIROS_MODEL="generic"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
[root@fedora kairos]# systemctl status --failed
× k3s.service
     Loaded: loaded (/etc/systemd/system/k3s.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
             /etc/systemd/system/k3s.service.d
             └─override.conf
     Active: failed (Result: exit-code) since Wed 2024-01-10 11:30:51 UTC; 34s ago
   Duration: 4.753s
    Process: 1376 ExecStart=/usr/bin/k3s server (code=exited, status=1/FAILURE)
   Main PID: 1376 (code=exited, status=1/FAILURE)
        CPU: 4.293s

Jan 10 11:30:51 fedora k3s[1376]: I0110 11:30:51.393758    1376 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
Jan 10 11:30:51 fedora k3s[1376]: I0110 11:30:51.393766    1376 tlsconfig.go:240] "Starting DynamicServingCertificateController"
Jan 10 11:30:51 fedora k3s[1376]: I0110 11:30:51.393787    1376 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::clie>
Jan 10 11:30:51 fedora k3s[1376]: I0110 11:30:51.393790    1376 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca->
Jan 10 11:30:51 fedora k3s[1376]: I0110 11:30:51.393796    1376 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requ>
Jan 10 11:30:51 fedora k3s[1376]: I0110 11:30:51.393798    1376 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requesthea>
Jan 10 11:30:51 fedora k3s[1376]: E0110 11:30:51.394316    1376 kubelet.go:1542] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubep>
Jan 10 11:30:51 fedora systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Jan 10 11:30:51 fedora systemd[1]: k3s.service: Failed with result 'exit-code'.
Jan 10 11:30:51 fedora systemd[1]: k3s.service: Consumed 4.293s CPU time.

× systemd-userdbd.service - User Database Manager
     Loaded: loaded (/usr/lib/systemd/system/systemd-userdbd.service; indirect; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Wed 2024-01-10 11:30:40 UTC; 44s ago
TriggeredBy: × systemd-userdbd.socket
       Docs: man:systemd-userdbd.service(8)
   Main PID: 962 (code=exited, status=226/NAMESPACE)
        CPU: 459us

Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.service: Main process exited, code=exited, status=226/NAMESPACE
Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.service: Failed with result 'exit-code'.
Jan 10 11:30:40 fedora systemd[1]: Failed to start systemd-userdbd.service - User Database Manager.
Jan 10 11:30:40 fedora systemd[1]: Starting systemd-userdbd.service - User Database Manager...
Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.service: Main process exited, code=exited, status=226/NAMESPACE
Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.service: Failed with result 'exit-code'.
Jan 10 11:30:40 fedora systemd[1]: Failed to start systemd-userdbd.service - User Database Manager.
Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.service: Start request repeated too quickly.
Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.service: Failed with result 'exit-code'.
Jan 10 11:30:40 fedora systemd[1]: Failed to start systemd-userdbd.service - User Database Manager.

× systemd-userdbd.socket - User Database Manager Socket
     Loaded: loaded (/usr/lib/systemd/system/systemd-userdbd.socket; enabled; preset: enabled)
     Active: failed (Result: service-start-limit-hit) since Wed 2024-01-10 11:30:40 UTC; 2min 22s ago
   Duration: 249ms
   Triggers: ● systemd-userdbd.service
       Docs: man:systemd-userdbd.service(8)
     Listen: /run/systemd/userdb/io.systemd.Multiplexer (Stream)

Jan 10 11:30:40 fedora systemd[1]: systemd-userdbd.socket: Failed with result 'service-start-limit-hit'.
@mauromorales mauromorales added the bug Something isn't working label Jan 10, 2024
@mauromorales mauromorales changed the title 🐛 k3s fails to start on Fedora 🐛 k3s fails to start on multiple distros Jan 10, 2024
@mauromorales
Copy link
Member Author

The issue is also happening on Ubuntu

@mauromorales
Copy link
Member Author

After a restart, it seems to be possible to run systemctl start k3s.service without issues. However, the service is not always enabled by default. It seems like the mounting of /etc/systemd as persistent causes some weird behaviors:

  1. if it was mounted as persistent, and we overwrite the service file with an empty file, why isn't the file empty after a reboot? (I can see the unit file created by the installer, not the one created by the provider)
  2. Why is it that on opensuse, the empty file created by the provider is not there?
  3. When enabling the service, a wants link is generated, how come this is not available after reboot on some distros? (maybe it's not always the case?)

@jimmykarily
Copy link
Contributor

ok after debugging we found out that:

  • on older versions (before this commit) we created a service in memory but never actually wrote it down. That's the reason why on 2.4.3 (opensuse) we see the original k3s service file that was created by the k3s installer script.
  • when the svc.Enable() fails, it doesn't exit with non-zero exit code. This explains why we didn't see a symlink in multi-user.target.wants dir (which would be the case if the service was enabled). The Enable() failed because the "empty" service file we created was not valid.

What we need to do, is stop producing our own service files and just enable the service that was created by the k3s installer scripts (and maybe the override file if needed, we need to check). According to @Itxaka , the original reason for creating an empty service file was that on the qualcomm board, for some reason the original k3s service files were missing. We don't need to worry about that anymore so we'll get rid of that code.

@jimmykarily
Copy link
Contributor

As a note, this is what makes sure, the original files from the kairos image make it to the persistent directory when we bind mount it: https://github.com/kairos-io/immucore/blob/e75c66b2d0715fc740180759c733d8d75cd0b69f/internal/utils/mounts.go#L96

jimmykarily pushed a commit to kairos-io/provider-kairos that referenced this issue Jan 10, 2024
just use the one that was created by the k3s installer script

Fixes: kairos-io/kairos#2125

Signed-off-by: Dimitris Karakasilis <dimitris@spectrocloud.com>
mauromorales added a commit to kairos-io/provider-kairos that referenced this issue Jan 10, 2024
just use the one that was created by the k3s installer script

Fixes: kairos-io/kairos#2125

Signed-off-by: Dimitris Karakasilis <dimitris@spectrocloud.com>
Co-authored-by: Mauro Morales <mauro.morales@spectrocloud.com>
@github-project-automation github-project-automation bot moved this from In Progress 🏃 to Done ✅ in 🧙Issue tracking board Jan 10, 2024
@jimmykarily jimmykarily moved this from Done ✅ to In Progress 🏃 in 🧙Issue tracking board Jan 10, 2024
@jimmykarily jimmykarily reopened this Jan 10, 2024
@github-project-automation github-project-automation bot moved this from In Progress 🏃 to Under review 🔍 in 🧙Issue tracking board Jan 10, 2024
@mauromorales
Copy link
Member Author

fedora working with the fix, I'll continue doing the release tests but will close this for now and reopen if I find it doesn't work in another distro

@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants