Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm init fails with "Error writing Crisocket information for the control-plane node: timed out waiting for the condition" #1587

Closed
Ankit-rana opened this issue May 31, 2019 · 24 comments
Labels
priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@Ankit-rana
Copy link

Ankit-rana commented May 31, 2019

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): sudo kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-29T12:00:00Z", GoVersion:"go1.11.10", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):~> kubectl version
    Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.6", GitCommit:"abdda3f9fefa29172298a2e42f5102e777a8ec25", GitTreeState:"clean", BuildDate:"2019-05-08T13:53:53Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.6", GitCommit:"abdda3f9fefa29172298a2e42f5102e777a8ec25", GitTreeState:"clean", BuildDate:"2019-05-08T13:46:28Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
    cat /etc/os-release
    NAME="SLES"
    VERSION="15"
    VERSION_ID="15"
    PRETTY_NAME="SUSE Linux Enterprise Server 15"
    ID="sles"
    ID_LIKE="suse"
    ANSI_COLOR="0;32"
    CPE_NAME="cpe:/o:suse:sles:15"
  • Kernel (e.g. uname -a):Linux master-2 4.12.14-150.17-default kubeadm join on slave node fails preflight checks #1 SMP Thu May 2 15:15:46 UTC 2019 (bf13fb8) x86_64 x86_64 x86_64 GNU/Linux

What happened?

kubeadm failed with error "Error writing Crisocket information for the control-plane node: timed out waiting for the condition"

What you expected to happen?

"sudo kubeadm init --pod-network-cidr 10.248.0.0/16" command should have setup all the component of master succesfully.

How to reproduce it (as minimally and precisely as possible)?

crayadm@master-2:~> sudo kubeadm init --pod-network-cidr 10.248.0.0/16
I0531 18:16:32.726064 5686 version.go:237] remote version is much newer: v1.14.2; falling back to: stable-1.13
[init] Using Kubernetes version: v1.13.6
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master-2 localhost] and IPs [10.248.0.210 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master-2 localhost] and IPs [10.248.0.210 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master-2 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.248.0.210]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 14.003255 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master-2" as an annotation
[kubelet-check] Initial timeout of 40s passed.
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition.

Anything else we need to know?

~> cat /etc/crictl.yaml
runtime-endpoint: unix:///var/run/dockershim.sock
image-endpoint: unix:///var/run/dockershim.sock
timeout: 10
debug: true
~> crictl pods
FATA[0010] failed to connect: failed to connect: context deadline exceeded
~> systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-05-25 14:29:49 UTC; 6 days ago
Docs: http://docs.docker.com
Main PID: 19508 (dockerd)
Tasks: 120
Memory: 103.3M
CPU: 48min 59.396s
CGroup: /system.slice/docker.service
├─ 5878 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/50ba657c71cfeccf8ffd3544334ab2fa9f0576>
├─ 5880 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/830a32f38fec61cd909a31543bc65d2b9b6abe>
├─ 5883 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/95bd179c9c1c90c0f3f97c8e8ee22203b75ecc>
├─ 5897 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3ccce84564cc38018df299ca780f0929b63fa0>
├─ 5934 /pause
├─ 5941 /pause
├─ 5946 /pause
├─ 5968 /pause
├─ 6045 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9d66fe3c91d7154e3115620d2c6bc4334548a8>
├─ 6059 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/41fca5033552e43d50a53433228a5f7c3e413c>
├─ 6060 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/a39bffdf8469140338eaabc2d2b490aeaf013b>
├─ 6061 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/2de48a98c6d58d3e7b9322939bd542a9e6a273>
├─ 6094 kube-apiserver --authorization-mode=Node,RBAC --advertise-address=10.248.0.210 --allow-privileged=true --client-ca-file=/etc/kubernetes/pki/ca.crt --enable->
├─ 6111 etcd --advertise-client-urls=https://10.248.0.210:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --in>
├─ 6130 kube-controller-manager --address=127.0.0.1 --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-k>
├─ 6140 kube-scheduler --address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true
├─19508 /usr/bin/dockerd --add-runtime oci=/usr/sbin/docker-runc
└─19515 docker-containerd --config /var/run/docker/containerd/containerd.toml

@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 1, 2019
@neolit123
Copy link
Member

do you see anything suspicions in the kubelet logs?
this might be an issue that has to be moved to k/k instead of k/kubeadm.

@Ankit-rana
Copy link
Author

Ankit-rana commented Jun 2, 2019

`~> sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2019-06-02 06:30:57 UTC; 8min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 14593 (hyperkube)
Tasks: 16 (limit: 131072)
Memory: 110.2M
CPU: 14.669s
CGroup: /system.slice/kubelet.service
└─14593 /usr/bin/hyperkube kubelet --logtostderr=true --v=2 --hostname-override=127.0.0.1 --allow-privileged=false --config=/etc/kubernetes/kubelet-config.yaml --volume-plugin-dir=/usr/lib/kubernetes/kubelet-plugins

Jun 02 06:38:50 master-2 hyperkube[14593]: I0602 06:38:50.049825 14593 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node 127.0.0.1
Jun 02 06:38:50 master-2 hyperkube[14593]: I0602 06:38:50.049841 14593 kubelet_node_status.go:446] Recording NodeHasSufficientPID event message for node 127.0.0.1
Jun 02 06:38:56 master-2 hyperkube[14593]: I0602 06:38:56.046519 14593 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
Jun 02 06:38:56 master-2 hyperkube[14593]: I0602 06:38:56.049899 14593 kubelet_node_status.go:446] Recording NodeHasSufficientMemory event message for node 127.0.0.1
Jun 02 06:38:56 master-2 hyperkube[14593]: I0602 06:38:56.049936 14593 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node 127.0.0.1
Jun 02 06:38:56 master-2 hyperkube[14593]: I0602 06:38:56.049951 14593 kubelet_node_status.go:446] Recording NodeHasSufficientPID event message for node 127.0.0.1
Jun 02 06:38:57 master-2 hyperkube[14593]: I0602 06:38:57.046597 14593 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
Jun 02 06:38:57 master-2 hyperkube[14593]: I0602 06:38:57.051870 14593 kubelet_node_status.go:446] Recording NodeHasSufficientMemory event message for node 127.0.0.1
Jun 02 06:38:57 master-2 hyperkube[14593]: I0602 06:38:57.052543 14593 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node 127.0.0.1
Jun 02 06:38:57 master-2 hyperkube[14593]: I0602 06:38:57.053043 14593 kubelet_node_status.go:446] Recording NodeHasSufficientPID event message for node 127.0.0.1`

Kubelet logs: link

@yelongyu
Copy link

Got the same issue:

I0618 20:52:58.262122   31415 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.4 (linux/amd64) kubernetes/c27b913" 'https://k8s-master:60443/api/v1/nodes/10-10-40-93'
I0618 20:52:58.265602   31415 round_trippers.go:438] GET https://k8s-master:60443/api/v1/nodes/10-10-40-93 404 Not Found in 3 milliseconds
I0618 20:52:58.265620   31415 round_trippers.go:444] Response Headers:
I0618 20:52:58.265626   31415 round_trippers.go:447]     Content-Type: application/json
I0618 20:52:58.265632   31415 round_trippers.go:447]     Content-Length: 192
I0618 20:52:58.265636   31415 round_trippers.go:447]     Date: Tue, 18 Jun 2019 12:52:58 GMT
I0618 20:52:58.265684   31415 request.go:942] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes \"10-10-40-93\" not found","reason":"NotFound","details":{"name":"10-10-40-93","kind":"nodes"},"code":404}
I0618 20:52:58.265837   31415 round_trippers.go:419] curl -k -v -XGET  -H "User-Agent: kubeadm/v1.13.4 (linux/amd64) kubernetes/c27b913" -H "Accept: application/json, */*" 'https://k8s-master:60443/api/v1/nodes/10-10-40-93'
I0618 20:52:58.268370   31415 round_trippers.go:438] GET https://k8s-master:60443/api/v1/nodes/10-10-40-93 404 Not Found in 2 milliseconds
I0618 20:52:58.268389   31415 round_trippers.go:444] Response Headers:
I0618 20:52:58.268396   31415 round_trippers.go:447]     Content-Type: application/json
I0618 20:52:58.268402   31415 round_trippers.go:447]     Content-Length: 192
I0618 20:52:58.268409   31415 round_trippers.go:447]     Date: Tue, 18 Jun 2019 12:52:58 GMT
I0618 20:52:58.268430   31415 request.go:942] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes \"10-10-40-93\" not found","reason":"NotFound","details":{"name":"10-10-40-93","kind":"nodes"},"code":404}
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition

@neolit123
Copy link
Member

there are multiple aspects at play here, but i don't know the exact cause.

cat /etc/crictl.yaml
runtime-endpoint: unix:///var/run/dockershim.sock

why not use /run/containerd/containerd.sock?

/usr/bin/hyperkube

hyperkube is not something we have e2e test for, so i wouldn't say the kubeadm team supports it.

please try newer versions of k8s and re-open this ticket if the problem persists or if you have found the cause. our e2e test signal for 1.13 is green using containerd and docker.

@xsaardo
Copy link

xsaardo commented Nov 12, 2019

Kubernetes version

Client Version: v1.15.4
Server Version: v1.15.4

Kubeadm version

kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.4", GitCommit:"67d2fcf276fcd9cf743ad4be9a9ef5828adc082f", GitTreeState:"clean", BuildDate:"2019-09-18T14:48:18Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Kubectl get pods:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-7697bc9b99-j9zd8   1/1     Running   0          43m
kube-system   calico-node-jwrg8                          1/1     Running   0          41m
kube-system   calico-node-r52d8                          1/1     Running   0          43m
kube-system   calico-node-zpzxb                          1/1     Running   0          42m
kube-system   coredns-5c98db65d4-764bm                   1/1     Running   0          48m
kube-system   coredns-5c98db65d4-z8h78                   1/1     Running   0          48m
kube-system   etcd-e1n1-g                                1/1     Running   0          47m
kube-system   kube-apiserver-e1n1-g                      1/1     Running   0          47m
kube-system   kube-controller-manager-e1n1-g             1/1     Running   0          47m
kube-system   kube-proxy-f6px5                           1/1     Running   0          41m
kube-system   kube-proxy-njjcp                           1/1     Running   0          42m
kube-system   kube-proxy-nn6df                           1/1     Running   0          48m
kube-system   kube-scheduler-e1n1-g                      1/1     Running   0          46m

kubectl get nodes -o wide

NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE               KERNEL-VERSION               CONTAINER-RUNTIME
e1n1-g   Ready    master   48m   v1.15.4   192.168.142.101   <none>        OpenShift Enterprise   3.10.0-957.21.3.el7.x86_64   docker://18.9.8
e2n1-g   Ready    <none>   43m   v1.15.4   192.168.142.102   <none>        OpenShift Enterprise   3.10.0-957.21.3.el7.x86_64   docker://18.9.8
e3n1-g   Ready    <none>   42m   v1.15.4   192.168.142.103   <none>        OpenShift Enterprise   3.10.0-957.21.3.el7.x86_64   docker://18.9.8

We have been trying to add an additional fourth node to the cluster using kubeadm join command
kubeadm join 192.168.142.101:6443 --token rp0dqg.t7jdtltndurri2hh --discovery-token-ca-cert-hash sha256:b740086b5dfba97b4e416a95816b19383181b5785b9bc0b2480c43c4dfd5b1d7 -v 256
with the following error

I1112 16:57:54.572465   54028 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.15.4 (linux/amd64) kubernetes/67d2fcf" 'https://192.168.142.101:6443/api/v1/nodes/e4n1-g'
I1112 16:57:54.573492   54028 round_trippers.go:438] GET https://192.168.142.101:6443/api/v1/nodes/e4n1-g 401 Unauthorized in 1 milliseconds
I1112 16:57:54.573520   54028 round_trippers.go:444] Response Headers:
I1112 16:57:54.573533   54028 round_trippers.go:447]     Content-Type: application/json
I1112 16:57:54.573545   54028 round_trippers.go:447]     Content-Length: 129
I1112 16:57:54.573556   54028 round_trippers.go:447]     Date: Tue, 12 Nov 2019 21:56:31 GMT
I1112 16:57:54.573586   54028 request.go:947] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition

Kubelet service

[root@e4n1-g ~]# systemctl  -l status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Tue 2019-11-12 17:44:02 EST; 53s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 59061 (kubelet)
    Tasks: 58
   Memory: 52.5M
   CGroup: /system.slice/kubelet.service
           └─59061 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --cgroup-driver=systemd

Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.015008   59061 kubelet.go:2252] node "e4n1-g" not found
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.115448   59061 kubelet.go:2252] node "e4n1-g" not found
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.188180   59061 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Unauthorized
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.215887   59061 kubelet.go:2252] node "e4n1-g" not found
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.316037   59061 kubelet.go:2252] node "e4n1-g" not found
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.387478   59061 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Unauthorized
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.416251   59061 kubelet.go:2252] node "e4n1-g" not found
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.516633   59061 kubelet.go:2252] node "e4n1-g" not found
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.587736   59061 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:454: Failed to list *v1.Node: Unauthorized
Nov 12 17:44:56 e4n1-g kubelet[59061]: E1112 17:44:56.616816   59061 kubelet.go:2252] node "e4n1-g" not found

We have seen the error node not found from the other nodes that have been successfully added to teh cluster so we are not sure if that is part of the problem. We tried the suggestions from other related issues but we are still unable to add the node to the cluster. Any help is appreciated

@neolit123
Copy link
Member

error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition

could this be a case where the bootstrap token used for join is expired?

@xsaardo
Copy link

xsaardo commented Nov 12, 2019

@neolist123 we have tried recreating the token so it should still be valid

@neolit123
Copy link
Member

try enabling --v=10 for "join" and observe the API call failures.
this might give a better indication of what is going on.

@xsaardo
Copy link

xsaardo commented Nov 12, 2019

Here's the output of kubeadm join
kubeadm-join.log

@neolit123
Copy link
Member

the --v=10 logs just confirm that it's retrying.
right after this happens can you dump a kubelet log using journalctl -xeu kubelet too?

what is different about this node?
what other nodes do you have in the cluster?

@vikramkhatri
Copy link

@neolit123 - Here is the output.

# journalctl -xeu kubelet
Nov 13 09:24:04 e4n1-g kubelet[68502]: E1113 09:24:04.723961   68502 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:454: FNov 13 09:24:04 e4n1-g kubelet[68502]: E1113 09:24:04.752982   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:24:04 e4n1-g kubelet[68502]: E1113 09:24:04.853385   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:24:04 e4n1-g kubelet[68502]: E1113 09:24:04.923760   68502 reflector.go:125] k8s.io/client-go/informers/factory.go:133: FailNov 13 09:24:04 e4n1-g kubelet[68502]: E1113 09:24:04.953533   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:24:05 e4n1-g kubelet[68502]: E1113 09:24:05.053956   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:24:05 e4n1-g kubelet[68502]: E1113 09:24:05.123712   68502 reflector.go:125] k8s.io/client-go/informers/factory.go:133: FailNov 13 09:24:05 e4n1-g kubelet[68502]: E1113 09:24:05.154214   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:24:05 e4n1-g kubelet[68502]: E1113 09:24:05.254451   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:24:05 e4n1-g kubelet[68502]: E1113 09:24:05.324240   68502 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:445: FNov 13 09:24:05 e4n1-g kubelet[68502]: E1113 09:24:05.354904   68502 kubelet.go:2252] node "e4n1-g" not found
...
Nov 13 09:26:06 e4n1-g kubelet[68502]: E1113 09:26:06.725968   68502 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:445: FNov 13 09:26:06 e4n1-g kubelet[68502]: E1113 09:26:06.754334   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:26:06 e4n1-g kubelet[68502]: E1113 09:26:06.854619   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:26:06 e4n1-g kubelet[68502]: E1113 09:26:06.874603   68502 controller.go:125] failed to ensure node lease exists, will retryNov 13 09:26:06 e4n1-g kubelet[68502]: E1113 09:26:06.926154   68502 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.Nov 13 09:26:06 e4n1-g kubelet[68502]: E1113 09:26:06.954875   68502 kubelet.go:2252] node "e4n1-g" not found
Nov 13 09:26:07 e4n1-g kubelet[68502]: E1113 09:26:07.055283   68502 kubelet.go:2252] node "e4n1-g" not found

I appreciate your help. We have searched all possible google suggestions but unable to resolve this. This started when we did kubeadm reset to rebuild the cluster. This node was the part of the cluster before but after the last reset - something is preventing it to join. Your insight will be very helpful. I

@vikramkhatri
Copy link

Ok - Finally, this is solved. The issue was this.

When we did kubeadm reset and it worked. It said that it deleted the directory.

[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]

And we assumed that it did what it said. When I checked /var/lib/kubelet - it was still there. I tried deleting this manually, and it failed for the pods directory that was there from previous Kubernetes installation. The reason - it was not able to delete mount directory due to read only attribute set to it by Portworx previously. I had to use chattr -i mount, and then I was able to delete the pods directory from /var/lib/kubelet directory. The kubeadm join worked perfectly after that.

It consumed our whole day. When kubeadm reset says that it deleted some directories - Make sure that those are gone in fact. I hope that this helps someone else who might have similar issues. The kubeadm should throw an error if a directory deletion was not successful.

@neolit123
Copy link
Member

It consumed our whole day. When kubeadm reset says that it deleted some directories - Make sure that those are gone in fact. I

what is your kubeadm version?

ultimately kubeadm reset is a best effort command and if something is protected we don't want to touch it. what we can fix here is giving a better indication if a delete failed.

@vikramkhatri
Copy link

we can fix here is giving a better indication if a delete failed

Thank you as that will be very helpful to know that something did not go right.

# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.4", GitCommit:"67d2fcf276fcd9cf743ad4be9a9ef5828adc082f", GitTreeState:"clean", BuildDate:"2019-09-18T14:48:18Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

@neolit123
Copy link
Member

i should have asked for the output from reset too.
did it not show a warning at least?

we did some refactoring for reset in 1.16 and it's not clear to me whether this is already fixed or not.

@vikramkhatri
Copy link

I was able to scroll up and get the output for kubeadm reset when this directory was there. It did not show a warning for not able to delete a file or a directory.

# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W1113 09:56:37.221105  130278 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

After above:

# ls -l /var/lib/kubelet/
total 32
-rw-r--r-- 1 root root 1744 Nov 13 10:01 config.yaml
-rw------- 1 root root   62 Oct  3 15:49 cpu_manager_state
drwxr-xr-x 2 root root 4096 Nov 13 10:01 device-plugins
drwxr-xr-x 2 root root 4096 Oct  3 15:49 pki
drwx------ 2 root root 4096 Oct  3 15:49 plugin-containers
drwxr-x--- 3 root root 4096 Nov 12 13:35 plugins
drwxr-x--- 2 root root 4096 Oct  4 12:51 plugins_registry
drwxr-x--- 4 root root 4096 Nov 12 13:31 pods
[root@e4n1-g ~]# rm -fr /var/lib/kubelet
rm: cannot remove ‘/var/lib/kubelet/pods/f5d42183-0eb2-433c-9feb-0e531dbd28ef/volumes/kubernetes.io~csi/pvc-6a8ccb7a-7262-41cf-9c28-5e2e59755445/mount’: Operation not permitted
rm: cannot remove ‘/var/lib/kubelet/pods/a104a67d-cca8-442d-a19d-ea29c1df4185/volumes/kubernetes.io~csi/pvc-e7897257-cca6-45da-a46b-8eab54cb13dc/mount’: Operation not permitted
rm: cannot remove ‘/var/lib/kubelet/pods/a104a67d-cca8-442d-a19d-ea29c1df4185/volumes/kubernetes.io~csi/pvc-11a62da0-cbf9-4d5c-8978-acbedd2d96a4/mount’: Operation not permitted
[root@e4n1-g ~]# ls -l /var/lib/kubelet/
total 4
drwxr-x--- 4 root root 4096 Nov 12 13:31 pods

@neolit123
Copy link
Member

i will log an issue with your details and investigate if this is fixed in 1.16
thanks.

@Tcarters
Copy link

Tcarters commented Mar 2, 2021

Hi Please the issue is resolved???
I have the same here when launching with the command: "kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --ignore-preflight-errors=Mem --node-name=Master"

[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

@neolit123
Copy link
Member

for questions please use #kubeadm and the support channels:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

@Tcarters
Copy link

Tcarters commented Mar 2, 2021

for questions please use #kubeadm and the support channels:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

I am not asking a question i have issue with kubeadm init ...can you check the error for how to resolve it ?

@neolit123
Copy link
Member

neolit123 commented Mar 2, 2021

the links i gave are the place to ask for support too.

you seem to be ignoring the CPU and Memory checks, so probably the machine doesn't have enough memory and the apiserver cannot start properly.

@stevanbangle
Copy link

sudo kubeadm reset
worked for me

@GitZhangChi
Copy link

sudo kubeadm reset worked for me

gooooooooood

@JiayangZhou
Copy link

This fixed for me. #1438 (comment)
drop 20-etcd-service-manager.conf under /etc/systemd/system/kubelet.service.d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

9 participants