Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster does not initialize properly, kindnet times out and crashes. #1461

Closed
salasrod opened this issue Apr 2, 2020 · 23 comments
Closed

Cluster does not initialize properly, kindnet times out and crashes. #1461

salasrod opened this issue Apr 2, 2020 · 23 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@salasrod
Copy link

salasrod commented Apr 2, 2020

What happened:

CoreDNS fails to start, and Kindnet has timeouts in their logs:

I0402 17:39:58.069948       1 main.go:64] hostIP = 172.17.0.2
podIP = 172.17.0.2
I0402 17:40:33.166911       1 main.go:104] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
I0402 17:41:03.167486       1 main.go:104] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
I0402 17:41:34.167858       1 main.go:104] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
I0402 17:42:06.168271       1 main.go:104] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
I0402 17:42:39.168601       1 main.go:104] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
panic: Reached maximum retries obtaining node list: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout

What you expected to happen:

CoreDNS containers to start, and kindnet not to crash

How to reproduce it (as minimally and precisely as possible):

kind create cluster
kubectl -n kube-system logs kindnet-xxxxx

NAME                                         READY   STATUS             RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
coredns-6955765f44-tvtk5                     0/1     Pending            0          20m   <none>       <none>               <none>           <none>
coredns-6955765f44-vbpxc                     0/1     Pending            0          20m   <none>       <none>               <none>           <none>
etcd-kind-control-plane                      1/1     Running            0          20m   172.17.0.2   kind-control-plane   <none>           <none>
kindnet-wgsbc                                0/1     CrashLoopBackOff   5          20m   172.17.0.2   kind-control-plane   <none>           <none>
kube-apiserver-kind-control-plane            1/1     Running            0          20m   172.17.0.2   kind-control-plane   <none>           <none>
kube-controller-manager-kind-control-plane   1/1     Running            0          20m   172.17.0.2   kind-control-plane   <none>           <none>
kube-proxy-gwjfp                             1/1     Running            0          20m   172.17.0.2   kind-control-plane   <none>           <none>
kube-scheduler-kind-control-plane            1/1     Running            0          20m   172.17.0.2   kind-control-plane   <none>           <none>

Environment:

  • kind version: (use kind version): 0.7.0
  • Kubernetes version: (use kubectl version): 1.17.0
  • Docker version: (use docker info): 19.03.8
  • OS (e.g. from /etc/os-release): Gentoo
@salasrod salasrod added the kind/bug Categorizes issue or PR as related to a bug. label Apr 2, 2020
@BenTheElder
Copy link
Member

kindnetd is failing here because it can't reach the apiserver, can you tell me more about your host environment and exactly how you ran kind?

@BenTheElder
Copy link
Member

specifically docker info would be useful, and if you could dump /etc/resolv.conf, and which iptables-save / get the iptables version.

I suspect this is the upstream bug kubernetes/kubernetes#71305, which has the following options:

  • convert your host to use iptables-legacy instead of iptables-nft (see the bug)
  • try using the latest kind from HEAD, where we do more to attempt to mitigate this

@salasrod
Copy link
Author

salasrod commented Apr 2, 2020

docker info:

Client:
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc.)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 35bd7a5f69c13e1563af8a93431411cd9ecf5021
 runc version: 
 init version: fec3683b971d9c3ef73f284f176672c44b448662
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.5.14-gentoo
 Operating System: Gentoo/Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.58GiB
 Name: thundercloud
 ID: DYOX:QA2L:J7ZA:WS2R:RRHG:5OCY:ZHGS:VBNH:HCJU:PZQJ:S4FC:HD65
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

/etc/resolv.conf

# Generated by NetworkManager
search lan
nameserver 192.168.1.20
nameserver 2604:5500:c248:6b00:134f:7b72:6bb3:71d2
nameserver 2604:5500:c248:6b00::1

I do have legacy iptables:

eselect iptables list

Available iptables symlink targets:
  [1]   xtables-legacy-multi *`

iptables --version

iptables v1.8.4 (legacy)

@salasrod
Copy link
Author

salasrod commented Apr 2, 2020

Will try to use the kind from head and see what the results are and report back.

@aojea
Copy link
Contributor

aojea commented Apr 2, 2020

paste the docker exec -it kind-control-plane iptables-save please?

@salasrod
Copy link
Author

salasrod commented Apr 2, 2020

docker exec -it kind-control-plan iptables-save

docker exec -it kind-control-plane iptables-save
# Generated by iptables-save v1.8.3 on Thu Apr  2 19:39:25 2020
*mangle
:PREROUTING ACCEPT [16594:3953361]
:INPUT ACCEPT [16594:3953361]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [16597:3953545]
:POSTROUTING ACCEPT [16597:3953545]
:KUBE-KUBELET-CANARY - [0:0]
COMMIT
# Completed on Thu Apr  2 19:39:25 2020
# Generated by iptables-save v1.8.3 on Thu Apr  2 19:39:25 2020
*filter
:INPUT ACCEPT [16594:3953361]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [16597:3953545]
:KUBE-FIREWALL - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
COMMIT
# Completed on Thu Apr  2 19:39:25 2020
# Generated by iptables-save v1.8.3 on Thu Apr  2 19:39:25 2020
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-DROP - [0:0]
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
COMMIT
# Completed on Thu Apr  2 19:39:25 2020

@aojea
Copy link
Contributor

aojea commented Apr 2, 2020

That seems the problem, is missing all the kube-proxy rules with the services.

What is kube-proxy logging kubectl logs -n kube-system kube-proxy-pk8km?

@salasrod
Copy link
Author

salasrod commented Apr 2, 2020

Yep, absolutely seems iptables is the problem.

W0402 20:12:30.007173       1 proxier.go:608] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0402 20:12:30.009287       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0402 20:12:30.012737       1 node.go:135] Successfully retrieved node IP: 172.17.0.2
I0402 20:12:30.012751       1 server_others.go:145] Using iptables Proxier.
I0402 20:12:30.012899       1 server.go:571] Version: v1.17.2
I0402 20:12:30.013160       1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0402 20:12:30.013219       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0402 20:12:30.013252       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0402 20:12:30.013379       1 config.go:131] Starting endpoints config controller
I0402 20:12:30.013396       1 config.go:313] Starting service config controller
I0402 20:12:30.013401       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0402 20:12:30.013401       1 shared_informer.go:197] Waiting for caches to sync for service config
I0402 20:12:30.113567       1 shared_informer.go:204] Caches are synced for endpoints config 
I0402 20:12:30.113583       1 shared_informer.go:204] Caches are synced for service config 
E0402 20:12:30.123613       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:12:30.123669       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:12:30.133062       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:12:30.133110       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:13:00.136043       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:13:00.136097       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:13:30.138287       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:13:30.138303       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:14:00.144184       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:14:00.144209       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:14:30.153562       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:14:30.153612       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:15:00.162922       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:15:00.162966       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:15:30.165296       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:15:30.165310       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:16:00.167464       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:16:00.167475       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:16:30.177019       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:16:30.177079       1 proxier.go:779] Sync failed; retrying in 30s
E0402 20:17:00.186090       1 proxier.go:795] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match `comment':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0402 20:17:00.186105       1 proxier.go:779] Sync failed; retrying in 30s

This is also with what is on the latest master.

I also just noticed that the issue that Ben linked was with iptables with =>1.8, so it would definitely apply to me.

@BenTheElder
Copy link
Member

I think you might have to switch iptables to use the legacy backend. I'm not having this issue on a work machine that i'm pretty sure is nft backed though 🤔

I'll double check that config.

@BenTheElder
Copy link
Member

google work machine (on which kind works...)

$ readlink -f $(which iptables)
/usr/sbin/xtables-nft-multi

the distro is customized debian-testing

@BenTheElder
Copy link
Member

$ iptables-save --version
iptables-save v1.8.3 (nf_tables)

@BenTheElder
Copy link
Member

So I'm pretty convinced that on iptables_nft systems the current mitigation is to switch to iptables legacy with update-alternatives and then reboot the host, which is not super great.

We can do better.
/assign
cc @howardjohn

@BenTheElder
Copy link
Member

I've reproduced this on k8s HEAD after a report from John, and confirmed that the switch to legacy iptables mitigates.

@BenTheElder BenTheElder added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 22, 2020
@BenTheElder BenTheElder added this to the v0.8.0 milestone Apr 22, 2020
@BenTheElder
Copy link
Member

#1508 should fix this.

@BenTheElder BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 25, 2020
@BenTheElder
Copy link
Member

1508 is in, I've confirmed that it fixes it in my environment.

please re-open if you continue to see this.

@ntoofu
Copy link

ntoofu commented Dec 20, 2020

I encountered the same problem. kind requires iptables to support comment match, so I have enabled CONFIG_NETFILTER_XT_MATCH_COMMENT in my kernel config to resolve the problem.

Most distributions probably enable it by default, but in the case of distributions like Gentoo Linux, you should configure it properly.

@BenTheElder
Copy link
Member

I imagine that's kube-proxy in upstream Kubernetes requiring this. Kind inherets all upstream host requirements except swap disabled which we allow (though it causes issues).

@Tiberivs
Copy link

I encountered the same problem with running kind on Gentoo Linux. I discovered that kind-proxy requires

CONFIG_IP_VS_RR
CONFIG_IP_VS_WRR
CONFIG_NETFILTER_XT_MATCH_STATISTIC
CONFIG_NETFILTER_XT_MATCH_COMMENT

I think it is not exhaustive list of course, but may be helpful for someone.

@tin-pham
Copy link

tin-pham commented Jun 7, 2023

I encountered the same problem with running kind on Gentoo Linux. I discovered that kind-proxy requires

CONFIG_IP_VS_RR
CONFIG_IP_VS_WRR
CONFIG_NETFILTER_XT_MATCH_STATISTIC
CONFIG_NETFILTER_XT_MATCH_COMMENT

I think it is not exhaustive list of course, but may be helpful for someone.

I know it've been 2 year but thanks man, i try to setup k8s and kind on gentoo, it come to the point I want to switch my os but you saved me

@aitorpazos
Copy link

Same for me in Ubuntu 22.04, loading the following kernel modules did the trick:

sudo modprobe xt_comment xt_statistic ip_vs_rr ip_vs_wrr

@aojea
Copy link
Contributor

aojea commented Aug 17, 2023

Same for me in Ubuntu 22.04, loading the following kernel modules did the trick:

ubuntu in WSL or the normal ubuntu distro?

@aitorpazos
Copy link

Actually KDE Neon on "bare metal" laptop

@chinchaun
Copy link

I encountered the same problem with running kind on Gentoo Linux. I discovered that kind-proxy requires

CONFIG_IP_VS_RR
CONFIG_IP_VS_WRR
CONFIG_NETFILTER_XT_MATCH_STATISTIC
CONFIG_NETFILTER_XT_MATCH_COMMENT

I think it is not exhaustive list of course, but may be helpful for someone.

Thank you very much! it solved my problem using a custom kernel 6.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants