CrashLoopBackOff Static Pods in containerd://2.0.2 version #11947

jeikeibnaa · 2025-02-06T06:41:46Z

What happened?

After installed kubeadm cluster using latest version of kubespray. All static pods CrashLoopBackOff including kubeapiserver, kubescheduler, kubecontrollermanager.

And I found latest pull request which support x2.0.v change is not generate compatible config.toml for v2.0.x and version 3 documentation of containerd.

Cluster Kube API Connection Issue

kubectl get pods -o wide -A

  The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?

Containerd Running Containers (crictl ps)

[root@uat-cluster-master-1 containerd]# crictl ps
CONTAINER           IMAGE               CREATED              STATE     NAME                      ATTEMPT   POD ID              POD
7aa346fe34353       c42f13656d0b2       22 seconds ago       Running   kube-apiserver            7         e9f8f274c8b34       kube-apiserver-uat-cluster-master-1
d00c26a309e27       a0bf559e280cf       About a minute ago   Running   kube-proxy                3         3814bd84f4a7b       kube-proxy-zss25
c363cf59bf7ab       c7aad43836fa5       About a minute ago   Running   kube-controller-manager   8         714bec976ca96       kube-controller-manager-uat-cluster-master-1
6135705ace823       259c8277fcbbc       About a minute ago   Running   kube-scheduler            4         c788bbc0ddec1       kube-scheduler-uat-cluster-master-1

Pods Status (kubectl get pods -A)

[root@uat-cluster-master-1 containerd]# k get pods -A
NAMESPACE            NAME                                           READY   STATUS             RESTARTS       AGE
kube-system          coredns-776bb9db5d-zr2bn                       0/1     Pending            0              8s
kube-system          dns-autoscaler-6ffb84bd6-pqfmm                 0/1     Pending            0              8s
kube-system          kube-apiserver-uat-cluster-master-1            1/1     Running            7 (35s ago)    5m46s
kube-system          kube-controller-manager-uat-cluster-master-1   1/1     Running            8 (105s ago)   3m33s
kube-system          kube-proxy-qh4pj                               0/1     CrashLoopBackOff   4 (13s ago)    3m40s
kube-system          kube-proxy-r88b7                               1/1     Running            5 (40s ago)    3m40s
kube-system          kube-proxy-zss25                               0/1     Error              3 (106s ago)   3m40s
kube-system          kube-scheduler-uat-cluster-master-1            1/1     Running            4 (111s ago)   3m55s
kube-system          metrics-server-8cfd759db-dx28f                 0/1     Pending            0              8s
kube-system          nginx-proxy-uat-cluster-worker-1               1/1     Running            3 (94s ago)    3m44s
kube-system          nginx-proxy-uat-cluster-worker-2               1/1     Running            3 (94s ago)    3m40s
local-path-storage   local-path-provisioner-7f89f58cc8-nsqwb        0/1     Pending            0              8s

/etc/containerd/config.toml

version = 3
root = "/var/lib/containerd"
state = "/run/containerd"
oom_score = 0

[grpc]
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  address = ""
  level = "info"
  format = ""
  uid = 0
  gid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.k8s.io/pause:3.9"
    max_container_log_line_size = 16384
    enable_unprivileged_ports = false
    enable_unprivileged_icmp = false
    enable_selinux = false
    disable_apparmor = false
    tolerate_missing_hugetlb_controller = true
    disable_hugetlb_controller = true
    image_pull_progress_timeout = "5m"
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      snapshotter = "overlayfs"
      discard_unpacked_layers = true
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          base_runtime_spec = "/etc/containerd/cri-base.json"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            systemdCgroup = true
            binaryName = "/usr/local/bin/runc"

Journal Logs (journalctl -xeu containerd)

space=k8s.io
" returns successfully"
1""
" must be in running or unknown state, current state "CONTAINER_EXITED""
" must be in running or unknown state, current state "CONTAINER_EXITED""
edce465093952978269baa3295389d81" id:"3bccefc2d5591cf9c26e3b0bb3d39385edce465093952978269baa3295389d81" pid:353301 exit_status:137 exited_at:{seconds:1738822725 nanos:810805678}"
81 namespace=k8s.io
3952978269baa3295389d81 namespace=k8s.io

space=k8s.io
baa3295389d81" exit_status:137 exited_at:{seconds:1738822725 nanos:810805678}"
a3295389d81" successfully"
1" returns successfully"
1""
" must be in running or unknown state, current state "CONTAINER_EXITED""
" must be in running or unknown state, current state "CONTAINER_EXITED""
a3295389d81" successfully"
1" returns successfully"
263c0a87a4d17039d32f491ef9e4936,Namespace:kube-system,Attempt:3,}"
be""
be" returns successfully"
address="unix:///run/containerd/s/01da5035e626b0ee5c318bc594164600a3464beadb44c5cb3ebb58bf6fdb445e" namespace=k8s.io protocol=ttrpc version=3
263c0a87a4d17039d32f491ef9e4936,Namespace:kube-system,Attempt:3,} returns sandbox id "f09a48eef1f296442f414fee02330c9e1c22db00977301e11571e8404572979b"

Containerd Version

[root@uat-cluster-master-1 containerd]# k get nodes -o wide
NAME                   STATUS     ROLES           AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                      KERNEL-VERSION                 CONTAINER-RUNTIME
uat-cluster-master-1   NotReady   control-plane   13m   v1.30.0   192.168.213.200   <none>        Rocky Linux 9.4 (Blue Onyx)   5.14.0-427.16.1.el9_4.x86_64   containerd://2.0.2
uat-cluster-worker-1   NotReady   <none>          10m   v1.30.0   192.168.213.201   <none>        Rocky Linux 9.4 (Blue Onyx)   5.14.0-427.16.1.el9_4.x86_64   containerd://2.0.2
uat-cluster-worker-2   NotReady   <none>          10m   v1.30.0   192.168.213.202   <none>        Rocky Linux 9.4 (Blue Onyx)   5.14.0-427.16.1.el9_4.x86_64   containerd://2.0.2

What did you expect to happen?

Kubernetes static pods should run without any issue

How can we reproduce it (as minimally and precisely as possible)?

Provision a Kubernetes cluster using Kubespray, ensuring that the container runtime is set to containerd://2.0.2. Upon deployment, the aforementioned error is expected to occur.

OS

Rocky Linux 9.4 (Blue Onyx) 5.14.0-427.16.1.el9_4.x86_64

Version of Ansible

2.17.0

Version of Python

python 3.12.3

Version of Kubespray (commit)

fe0a1f4

Network plugin used

cni

Full inventory with variables

[all]
uat-cluster-master-1 ansible_host=192.168.213.200 ip=192.168.213.200 ansible_user=ansible
uat-cluster-worker-1 ansible_host=192.168.213.201 ip=192.168.213.201 ansible_user=ansible
uat-cluster-worker-2 ansible_host=192.168.213.202 ip=192.168.213.202 ansible_user=ansible

[kube_control_plane]
uat-cluster-master-1 etcd_member_name=etcd1

[etcd:children]
kube_control_plane

[kube_node]
uat-cluster-worker-1
uat-cluster-worker-2

[all:vars]
ansible_become=yes
ansible_become_method=sudo
ansible_become_user=root

Command used to invoke ansible

Output of ansible run

PLAY RECAP *********************************************************************
uat-cluster-master-1       : ok=567  changed=111  unreachable=0    failed=0    skipped=950  rescued=0    ignored=3   
uat-cluster-worker-1       : ok=381  changed=61   unreachable=0    failed=0    skipped=586  rescued=0    ignored=1   
uat-cluster-worker-2       : ok=381  changed=61   unreachable=0    failed=0    skipped=583  rescued=0    ignored=1   
TASKS RECAP ********************************************************************
Thursday 06 February 2025  06:11:06 +0000 (0:00:00.128)       0:11:23.722 ***** 
=============================================================================== 
kubernetes/kubeadm : Create kubeadm token for joining nodes with 24h expiration (default) -- 49.00s
kubernetes/kubeadm : Join to cluster if needed ------------------------- 46.50s
kubernetes/control-plane : Create kubeadm token for joining nodes with 24h expiration (default) -- 32.59s
kubernetes/control-plane : Kubeadm | Initialize first control plane node -- 13.37s
download : Download_container | Download image if required -------------- 9.70s
kubernetes-apps/metrics_server : Metrics Server | Create manifests ------ 8.80s
etcd : Restart etcd ----------------------------------------------------- 7.99s
download : Download_container | Download image if required -------------- 7.85s
kubernetes-apps/external_provisioner/local_path_provisioner : Local Path Provisioner | Create manifests --- 7.42s

Anything else we need to know

No response

The text was updated successfully, but these errors were encountered:

jeikeibnaa · 2025-02-06T06:42:56Z

/assign @Yourself

k8s-ci-robot · 2025-02-06T06:42:59Z

@jeikeibnaa: GitHub didn't allow me to assign the following users: yourself.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @Yourself

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jeikeibnaa · 2025-02-07T04:05:37Z

/assign

jeikeibnaa added the kind/bug Categorizes issue or PR as related to a bug. label Feb 6, 2025

jeikeibnaa mentioned this issue Feb 6, 2025

fix: [containerd] Support containerd, Update config.toml.j2 to support containerd >= 2.0.X syntax change #11949

Open

k8s-ci-robot assigned jeikeibnaa Feb 7, 2025

0ekk linked a pull request Feb 10, 2025 that will close this issue

Fix containerd 2.x configuration #11963

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrashLoopBackOff Static Pods in containerd://2.0.2 version #11947

CrashLoopBackOff Static Pods in containerd://2.0.2 version #11947

jeikeibnaa commented Feb 6, 2025 •

edited

Loading

jeikeibnaa commented Feb 6, 2025

k8s-ci-robot commented Feb 6, 2025

jeikeibnaa commented Feb 7, 2025

CrashLoopBackOff Static Pods in containerd://2.0.2 version #11947

CrashLoopBackOff Static Pods in containerd://2.0.2 version #11947

Comments

jeikeibnaa commented Feb 6, 2025 • edited Loading

What happened?

Cluster Kube API Connection Issue

kubectl get pods -o wide -A

Containerd Running Containers (crictl ps)

Pods Status (kubectl get pods -A)

/etc/containerd/config.toml

Journal Logs (journalctl -xeu containerd)

Containerd Version

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know

jeikeibnaa commented Feb 6, 2025

k8s-ci-robot commented Feb 6, 2025

jeikeibnaa commented Feb 7, 2025

jeikeibnaa commented Feb 6, 2025 •

edited

Loading