cluster networking is broken? #24

liyimeng · 2019-02-08T21:01:42Z

helm install job never succeed, it seem that it is not possible to reach dns server.

alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl  get all -n kube-system 
NAME                             READY   STATUS             RESTARTS   AGE
pod/coredns-7748f7f6df-tp7fq     1/1     Running            1          104m
pod/helm-install-traefik-g5rmk   0/1     CrashLoopBackOff   21         104m

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP,9153/TCP   104m

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   1/1     1            1           104m

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-7748f7f6df   1         1         1       104m

NAME                             COMPLETIONS   DURATION   AGE
job.batch/helm-install-traefik   0/1           104m       104m

./k3s kubectl   -n kube-system logs -f pod/helm-install-traefik-g5rmk
+ export HELM_HOST=127.0.0.1:44134+ 
tiller --listen=127.0.0.1:44134 --storage=secret
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
[main] 2019/02/08 20:48:52 Starting Tiller v2.12.3 (tls=false)
[main] 2019/02/08 20:48:52 GRPC listening on 127.0.0.1:44134
[main] 2019/02/08 20:48:52 Probes listening on :44135
[main] 2019/02/08 20:48:52 Storage driver is Secret
[main] 2019/02/08 20:48:52 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.0.4:39333->10.43.0.10:53: i/o timeout

Verify by running a busy box

alpine:/home/alpine/k3s/dist/artifacts# ./k83s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
ash: ./k83s: not found
alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # 
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue 
    link/ether 32:03:33:52:8c:19 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.6/24 brd 10.42.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::3003:33ff:fe52:8c19/64 scope link 
       valid_lft forever preferred_lft forever
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.42.0.6
PING 10.42.0.6 (10.42.0.6): 56 data bytes
64 bytes from 10.42.0.6: seq=0 ttl=64 time=0.109 ms
64 bytes from 10.42.0.6: seq=1 ttl=64 time=0.108 ms
64 bytes from 10.42.0.6: seq=2 ttl=64 time=0.106 ms
^C
--- 10.42.0.6 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.106/0.107/0.109 ms

The text was updated successfully, but these errors were encountered:

ibuildthecloud · 2019-02-09T00:13:18Z

@liyimeng can you ensure the br_netfilter module is loaded. The agent is supposed to load this module but it seems to not always work. I'm troubleshooting that now.

ibuildthecloud · 2019-02-09T00:16:14Z

@liyimeng FYI, it you are running in a container you need bind mount in /lib/modules/$(uname -r):/lib/modules/$(uname -r):ro so that modules can be loaded

liyimeng · 2019-02-09T01:23:02Z

I do have it loaded

alpine:/home/alpine/k3s/dist/artifacts# lsmod | grep netfilter br_netfilter 20480 0 bridge 163840 1 br_netfilter

When I do a troubleshooting like this:
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#running-commands-in-a-pod

`
/ # wget -O- hostnames
Connecting to hostnames (10.43.129.144:80)
hostnames-85bc9c579-rtr4r

               100% |***********************************************************************************************************************|    26  0:00:00 ETA

/ # nslookup hostnames
Server: 10.43.0.10
Address: 10.43.0.10:53

** server can't find hostnames.default.svc.cluster.local: NXDOMAIN

*** Can't find hostnames.svc.cluster.local: No answer
*** Can't find hostnames.cluster.local: No answer
*** Can't find hostnames.default.svc.cluster.local: No answer
*** Can't find hostnames.svc.cluster.local: No answer
*** Can't find hostnames.cluster.local: No answer

/ # nslookup hostnames.default
;; connection timed out; no servers could be reached
/ # cat /etc/hosts

Kubernetes-managed hosts file.

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.0.7 busybox
/ # arp
10-42-0-9.hostnames.default.svc.cluster.local (10.42.0.9) at 96:12:b9:85:cf:5c [ether] on eth0
10-42-0-8.hostnames.default.svc.cluster.local (10.42.0.8) at 1e:87:4b:df:77:2a [ether] on eth0

? (10.42.0.1) at 4e:b8:bd:7b:10:7b [ether] on eth0
10-42-0-5.kube-dns.kube-system.svc.cluster.local (10.42.0.5) at ba:e7:34:61:0a:bc [ether] on eth0
10-42-0-10.hostnames.default.svc.cluster.local (10.42.0.10) at 4e:2b:cf:de:e2:9e [ether] on eth0
`

how strange it is, I actually can reach the hostnames service when run wget, but nslookup failed. I guess it is something wrong on forwarding the packet from pod to service, or wise verse.

Do we have tube-proxy or ipvs to map between service and pods?

liyimeng · 2019-02-09T01:35:32Z

OK, I see that we use kube-proxy, at least iptables for this. String enough, nslookup work on the host!
just not inside the pods!

`
nslookup www.google.com 10.43.0.10
Server: 10.43.0.10
Address 1: 10.43.0.10

Name: www.google.com
Address 1: 216.58.207.196 arn11s04-in-f4.1e100.net
Address 2: 2a00:1450:400e:809::2004 ams15s32-in-x04.1e100.net
alpine:/home/alpine/k3s/dist/artifacts# nslookup hostnames.default.svc.cluster.local 10.43.0.10
Server: 10.43.0.10
Address 1: 10.43.0.10

nslookup: can't resolve 'hostnames.default.svc.cluster.local': Name does not resolve
`

liyimeng · 2019-02-09T01:38:03Z

BTW, ip forwarding is on

alpine:/home/alpine/k3s/dist/artifacts# cat /proc/sys/net/ipv4/ip_forward 1

ibuildthecloud · 2019-02-11T22:49:16Z

@liyimeng is there any way I can reproduce your setup?

liyimeng · 2019-02-13T15:04:25Z

@ibuildthecloud Here is what I have done:

create a KVM VM
install alpine linux, v3.9.0 likely.
install necessary software, like git and docker
check out latest k3s
build and run k3s
I then see helm install job never success, the log says google is not reachable
./k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
try to make an nslookup www.google.com in the pod, it fails for me.

aaliddell · 2019-02-27T21:36:21Z

Seen the same issue when installing on an existing system. When running on a clean install, there are no issues.

After some testing, the issue appears to be in having existing iptables rules that have a default DROP policy on the INPUT. After setting the input policy to ACCEPT, the issue appears to be resolved. Therefore, either a note needs to be added to the docs or k3s needs to setup its own iptables rules on the INPUT chain to insure that traffic does not hit the default policy, which is usually DROP or REJECT for security reasons.

jose-sanchezm · 2019-03-05T16:41:27Z

In a fresh installation over CentOS 7.5 I'm getting the same issue:

# kubectl get pods --all-namespaces
NAMESPACE     NAME                         READY   STATUS             RESTARTS   AGE
default       ds4m-0                       1/1     Running            0          25m
kube-system   coredns-7748f7f6df-6j8kh     0/1     CrashLoopBackOff   9          25m
kube-system   helm-install-traefik-bprt6   0/1     CrashLoopBackOff   9          25m

# kubectl logs coredns-7748f7f6df-6j8kh --namespace kube-system
E0305 16:36:17.678604       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0305 16:36:19.680598       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
.:53
2019-03-05T16:36:21.676Z [INFO] CoreDNS-1.3.0
2019-03-05T16:36:21.676Z [INFO] linux/amd64, go1.11.4, c8f0e94
CoreDNS-1.3.0
linux/amd64, go1.11.4, c8f0e94
2019-03-05T16:36:21.676Z [INFO] plugin/reload: Running configuration MD5 = 3ef0d797df417f2c0375a4d1531511fb
E0305 16:36:23.686625       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0305 16:36:23.690587       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host

# kubectl logs helm-install-traefik-bprt6 --namespace kube-system
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
+ tiller --listen=127.0.0.1:44134 --storage=secret
[main] 2019/03/05 16:35:55 Starting Tiller v2.12.3 (tls=false)
[main] 2019/03/05 16:35:55 GRPC listening on 127.0.0.1:44134
[main] 2019/03/05 16:35:55 Probes listening on :44135
[main] 2019/03/05 16:35:55 Storage driver is Secret
[main] 2019/03/05 16:35:55 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.1.2:48204->10.43.0.10:53: read: no route to host

My firewalld configuration:

# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh
  ports: 4789/udp 6443/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules:

br_netfilter module is loaded:

# lsmod | grep netfilter
br_netfilter           22256  0 
bridge                151336  2 br_netfilter,ebtable_broute

Which extra rules do I have to configure to get it working?

aaliddell · 2019-03-05T16:53:04Z

I've not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

jose-sanchezm · 2019-03-05T17:21:41Z

I've added the rule and core-dns logs way less errors (although there are still some) but helm-install-traefik continues crashing continuously with the same error. Do I need another rule for it?

aaliddell · 2019-03-05T18:00:33Z

Does firewalld have a log somewhere of what packets it is blocking or a way to enable such a log? If so, look there to see what might still be getting dropped.

briandealwis · 2019-03-05T20:41:19Z

I'm seeing this with VMWare Photon 3.0. Adding @aaliddell's snippet to /etc/systemd/scripts/ip4save has done the trick.

sahlex · 2019-03-12T11:21:33Z

@briandealwis can you please point out which exact snippet you are referring to?

briandealwis · 2019-03-12T14:58:42Z

This comment:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

sahlex · 2019-03-12T14:59:42Z

I'm having the same problems.

From inside a pod, (busybox) the dns is configured as 10.43.x.x but no interface of that name is created. I start the server with

/usr/local/bin/k3s server --cluster-cidr 10.10.0.0/16

without disabling the coredns service. But my machine shows no interface with range 10.43.x.x:

cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.10.0.1  netmask 255.255.255.0  broadcast 10.10.0.255
        ether b6:05:a1:65:5e:49  txqueuelen 1000  (Ethernet)
        RX packets 1990  bytes 181178 (176.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1312  bytes 131217 (128.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.42.2.112  netmask 255.255.0.0  broadcast 10.42.255.255
        ether 00:15:5d:01:d2:21  txqueuelen 1000  (Ethernet)
        RX packets 744576  bytes 380910420 (363.2 MiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 71389  bytes 49850951 (47.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.10.0.0  netmask 255.255.255.255  broadcast 10.10.0.0
        ether 1e:41:71:d1:92:ff  txqueuelen 0  (Ethernet)
        RX packets 26  bytes 1944 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 888 (888.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Lokale Schleife)
        RX packets 632239  bytes 225761419 (215.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 632239  bytes 225761419 (215.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

aaliddell · 2019-03-12T15:15:49Z

The 10.43.0.0/16 range is the default service ClusterIP range, which isn't actually bound to any interface but is instead a 'virtual' ip that is routed by iptables to a pod backing the service: https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables

You can change the service ip range with --service-cidr: #171

sahlex · 2019-03-13T08:03:06Z

Thanks for your responses!

I added my cidr network to the firewall (accept rules show up there).

Still, after uninstalling k3s and reinstalling with

./install-k3s.sh --cluster-cidr=10.10.0.0/16

I still get errors related to DNS.

After the startup I tail the logs from coredns doing a kubectl logs -f coredns-7748f7f6df-6klwg -n kube-system:

[root@h20181152922 ~]# kubectl logs -f coredns-7748f7f6df-6klwg -n kube-system
.:53
2019-03-13T07:45:33.023Z [INFO] CoreDNS-1.3.0
2019-03-13T07:45:33.023Z [INFO] linux/amd64, go1.11.4, c8f0e94
CoreDNS-1.3.0
linux/amd64, go1.11.4, c8f0e94
2019-03-13T07:45:33.023Z [INFO] plugin/reload: Running configuration MD5 = 3ef0d797df417f2c0375a4d1531511fb
2019-03-13T07:45:54.026Z [ERROR] plugin/errors: 2 7066956279340311327.7933564162591032888. HINFO: unreachable backend: read udp 10.10.0.20:47160->1.1.1.1:53: i/o timeout
2019-03-13T07:45:57.025Z [ERROR] plugin/errors: 2 7066956279340311327.7933564162591032888. HINFO: unreachable backend: read udp 10.10.0.20:46236->1.1.1.1:53: i/o timeout

From the server logs:

Mar 13 08:50:47 docker2 k3s: time="2019-03-13T08:50:47.568418341+01:00" level=info msg="Running kubelet --healthz-bind-address 127.0.0.1 --read-only-port 0 --allow-privileged=true --cluster-domain cluster.local --kubeconfig /var/lib/rancher/k3s/agent/kubeconfig.yaml --eviction-hard imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --cgroup-driver cgroupfs --root-dir /var/lib/rancher/k3s/agent/kubelet --cert-dir /var/lib/rancher/k3s/agent/kubelet/pki --seccomp-profile-root /var/lib/rancher/k3s/agent/kubelet/seccomp --cni-conf-dir /var/lib/rancher/k3s/agent/etc/cni/net.d --cni-bin-dir /var/lib/rancher/k3s/data/e44f7a46cadac4cec9a759756f2a27fdb25e705a83d8d563207c6a6c5fa368b4/bin --cluster-dns 10.43.0.10 --container-runtime remote --container-runtime-endpoint unix:///run/k3s/containerd/containerd.sock --address 127.0.0.1 --anonymous-auth=false --client-ca-file /var/lib/rancher/k3s/agent/client-ca.pem --hostname-override h20181152922 --cpu-cfs-quota=false --runtime-cgroups /systemd/system.slice --kubelet-cgroups /systemd/system.slice"

WhenI try to do a nslookup from buysbox:

[root@h20181152922 k3s]# /usr/local/bin/k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup www.google.de
;; connection timed out; no servers could be reached

/ # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local haba.int
nameserver 10.43.0.10
options ndots:5
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

So it seems DNS is not working properly...

aaliddell · 2019-03-13T12:01:54Z

Pings to 10.43.0.0/16 addresses aren't going to respond, due to them being 'virtual' and only really existing within iptables. If your DNS requests are getting to the CoreDNS pod, then cluster networking looks like it's working. Your issue may be related to #53 (how on earth did a DNS problem get issue number 53...)

sahlex · 2019-03-13T15:48:40Z

In fact the coredns isn't able to reach out to 1.1.1.1:53. I changed it according to #53 and now its working!!

Thanks!

BTW: you're right on the issue number. What a nice coincidence!

odensc · 2019-03-17T00:39:56Z

To anyone running into this issue on Fedora, the proper command to add the iptables rule is:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

and then a firewall-cmd --reload fixed it for me.

Still having issues with DNS resolving though.

xykonur · 2019-03-22T10:23:36Z

Fedora 29 these fixed both CoreDNS and Traefik install for me:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

Might be possible to further narrow down or optimise the /15.

gdhgdhgdh · 2019-03-22T13:44:19Z

Fedora 29 these fixed both CoreDNS and Traefik install for me:

Perfect timing! This worked like a charm for me on CentOS 7. 🍻

Lunik · 2019-03-30T17:02:33Z

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm.
When adding a new node :

it show up with kubectl get nodes
when running new pods, they start properly on this node

I'm playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh
When codeDNS and busybox pods are not on the same host, they can't talk.
But when they are on the same node, they can...

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them.
k3s cluster launched with those start commands :
server : /usr/local/bin/k3s server --docker
node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

deniseschannon · 2019-04-03T16:36:16Z

This issue topic is very broad and each person's setup is different and unique. I'd like to close the original issue and if you are still having networking issues, can you open a new issue.

Ideally the subject is something that indicates what OS you are using, what version, and something specific about how the networking is broken.

Thanks for understanding!

Id2ndR · 2019-05-03T15:14:15Z

Fedora 29 these fixed both CoreDNS and Traefik install for me:
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload
Might be possible to further narrow down or optimise the /15.

The narrow solution whould be sudo iptables -A KUBE-FORWARD -s 10.42.0.0/16 -d 10.42.0.0/16 -m state --state NEW -j ACCEPT

However, KUBE-FORWARD table is updated quiclky, so previous command will work one time if you are quick enough. So you can use sudo firewall-cmd --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/16 -d 10.42.0.0/16 -m state --state NEW -j ACCEPT

Harguer · 2019-05-17T16:43:39Z

I had same error, and i spent some time to resolve it, even i reinstalled my OS, and tried with different kubes' versions. At the end the issue was firewalld.
I disabled and tried again with a fresh installation, and now it is working fine.

liyimeng · 2019-07-04T13:30:51Z

@deniseschannon it seems the same issue on k3os.

adi90x · 2019-07-25T11:14:25Z

I've not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:
iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

Is it correct for ufw or am I the other way around :

sudo ufw allow in on cni0 from 10.42.0.0/16 comment "K3s rule : https://github.com/rancher/k3s/issues/24#issuecomment-469759329"

matthewygf · 2019-08-10T14:42:24Z

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm.
When adding a new node :

it show up with kubectl get nodes

when running new pods, they start properly on this node

I'm playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh
When codeDNS and busybox pods are not on the same host, they can't talk.
But when they are on the same node, they can...

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them.
k3s cluster launched with those start commands :
server : /usr/local/bin/k3s server --docker
node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

@Lunik did you end up finding a solution for this ?? i am having the same problem.

devopswise · 2020-01-15T16:53:46Z

In case if you are having a similar issue, I notice there were rules related to docker in my chains. (I was using containerd). The steps I have followed:

Stop cluster. (systemctl stop k3s on master, systemctl stop k3s-agent on agents)
Delete all iptables rules in your chains like here: https://serverfault.com/a/200658/455081
Start cluster again.

kbrowder · 2020-04-15T23:31:21Z

On fedora 31 I found the simplest thing to do was:

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --reload

(edit: fix random space in -i)

jbutler992 · 2020-05-17T02:37:19Z

@kbrowder would you mind reviewing that command? Im on Fedora31 and seeing this issue but when I try your command it says its not a valid ipv4 filter command.

maci0 · 2020-05-24T12:22:30Z

@kbrowder would you mind reviewing that command? Im on Fedora31 and seeing this issue but when I try your command it says its not a valid ipv4 filter command.

there was just a typo in his command it says - i cni0 instead of -i cni0

besides that it works for me on centos8

kbrowder · 2020-05-26T19:22:59Z

@maci0, woops, you're right, I edited my response above, sorry for the delay @jbutler992

adacaccia · 2020-06-15T15:47:28Z

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

robodude666 · 2020-08-04T01:02:58Z

This is an old thread, but I still want to share this to potentially save someone days of frustration.

I had a private network of 10.42.42.0/24 and could not figure out why k3s was not working. Using non-default cluster/service CIDRs fixed networking issues for me.

elsbrock · 2020-11-09T21:07:43Z

This is an old thread, but I still want to share this to potentially save someone days of frustration.

I had a private network of 10.42.42.0/24 and could not figure out why k3s was not working. Using non-default cluster/service CIDRs fixed networking issues for me.

Thank you, kind sir. That hint saved me a ton of time!

cakiem8x · 2021-04-22T01:33:33Z

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

Thanks you !

k3s-io/k3s#24 (comment)

erikwilson added the help wanted label Mar 25, 2019

deniseschannon closed this as completed Apr 3, 2019

Guigouu mentioned this issue Apr 25, 2019

Cordns and helm-install-traefik CrashLoopBackOff (Again) #396

Closed

angelbarrera92 mentioned this issue Jun 15, 2019

Switch to pykube-ng to handle arbitrary resources (pods, jobs, etc) zalando-incubator/kopf#110

Merged

5 tasks

liyimeng mentioned this issue Jul 4, 2019

lost cluster network after running for a while rancher/k3os#141

Open

erikwilson mentioned this issue Aug 30, 2019

Question: How can we secure VXLAN on a bare metal provider? #777

Closed

ghost mentioned this issue May 15, 2020

CoreDNS and Traefik pods don't start (iptables) #1801

Closed

kopf-archiver bot mentioned this issue Aug 19, 2020

[PR] Switch to pykube-ng to handle arbitrary resources (pods, jobs, etc) nolar/kopf#110

Closed

5 tasks

TannerGabriel mentioned this issue Apr 17, 2021

Networking issue produces CrashLoopBackOff #3214

Closed

mikkel1156 mentioned this issue Apr 22, 2021

DNS resolution with k3s cluster is broken #1527

Closed

damacus mentioned this issue Jun 14, 2021

Changing the default storage provider? #3441

Closed

sj14 mentioned this issue Jul 6, 2021

Worker node unable to establish connections to the internet rancher/k3os#726

Open

joanbm mentioned this issue Aug 3, 2021

More visible documentation regarding kernel modules for rootless K3s (br_netfilter, etc.) #3751

Closed

prnvkv mentioned this issue Oct 18, 2021

How to reconfigure the HA node when the associated internal IP is changed ? #4233

Closed

DPS0340 added a commit to TeamE9uana/DeepMush that referenced this issue Jan 12, 2022

fix: dial tcp connect: connection refused

780929f

k3s-io/k3s#24 (comment)

DPS0340 mentioned this issue Jan 13, 2022

쿠버네티스 마이그레이션 TeamE9uana/DeepMush#47

Merged

jakesteele mentioned this issue Mar 29, 2022

Ubuntu 21.04 Server on Raspberry Pi, Fails to start #5355

Closed

tmartensson mentioned this issue Sep 26, 2022

Database returns: Duplicate entry '2147483647' for key 'PRIMARY' #6176

Closed

purevoip mentioned this issue Mar 2, 2023

CentOS7 - coredns failing with CrashLoopBackOff #7014

Closed

cluster networking is broken? #24

cluster networking is broken? #24

Comments

liyimeng commented Feb 8, 2019 • edited Loading

ibuildthecloud commented Feb 9, 2019

ibuildthecloud commented Feb 9, 2019

liyimeng commented Feb 9, 2019

Kubernetes-managed hosts file.

liyimeng commented Feb 9, 2019

liyimeng commented Feb 9, 2019

ibuildthecloud commented Feb 11, 2019

liyimeng commented Feb 13, 2019

aaliddell commented Feb 27, 2019

jose-sanchezm commented Mar 5, 2019 • edited Loading

aaliddell commented Mar 5, 2019

jose-sanchezm commented Mar 5, 2019

aaliddell commented Mar 5, 2019

briandealwis commented Mar 5, 2019

sahlex commented Mar 12, 2019

briandealwis commented Mar 12, 2019

sahlex commented Mar 12, 2019 • edited Loading

aaliddell commented Mar 12, 2019 • edited Loading

sahlex commented Mar 13, 2019

aaliddell commented Mar 13, 2019

sahlex commented Mar 13, 2019

odensc commented Mar 17, 2019 • edited Loading

xykonur commented Mar 22, 2019 • edited Loading

gdhgdhgdh commented Mar 22, 2019

Lunik commented Mar 30, 2019

My config

deniseschannon commented Apr 3, 2019

Id2ndR commented May 3, 2019

Harguer commented May 17, 2019

liyimeng commented Jul 4, 2019

adi90x commented Jul 25, 2019

matthewygf commented Aug 10, 2019

My config

devopswise commented Jan 15, 2020 • edited Loading

kbrowder commented Apr 15, 2020 • edited Loading

jbutler992 commented May 17, 2020

maci0 commented May 24, 2020

kbrowder commented May 26, 2020

adacaccia commented Jun 15, 2020 • edited Loading

robodude666 commented Aug 4, 2020

elsbrock commented Nov 9, 2020

cakiem8x commented Apr 22, 2021

liyimeng commented Feb 8, 2019 •

edited

Loading

jose-sanchezm commented Mar 5, 2019 •

edited

Loading

sahlex commented Mar 12, 2019 •

edited

Loading

aaliddell commented Mar 12, 2019 •

edited

Loading

odensc commented Mar 17, 2019 •

edited

Loading

xykonur commented Mar 22, 2019 •

edited

Loading

devopswise commented Jan 15, 2020 •

edited

Loading

kbrowder commented Apr 15, 2020 •

edited

Loading

adacaccia commented Jun 15, 2020 •

edited

Loading