Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster networking is broken? #24

Closed
liyimeng opened this issue Feb 8, 2019 · 39 comments
Closed

cluster networking is broken? #24

liyimeng opened this issue Feb 8, 2019 · 39 comments

Comments

@liyimeng
Copy link
Contributor

liyimeng commented Feb 8, 2019

helm install job never succeed, it seem that it is not possible to reach dns server.

alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl  get all -n kube-system 
NAME                             READY   STATUS             RESTARTS   AGE
pod/coredns-7748f7f6df-tp7fq     1/1     Running            1          104m
pod/helm-install-traefik-g5rmk   0/1     CrashLoopBackOff   21         104m

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP,9153/TCP   104m

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   1/1     1            1           104m

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-7748f7f6df   1         1         1       104m

NAME                             COMPLETIONS   DURATION   AGE
job.batch/helm-install-traefik   0/1           104m       104m

./k3s kubectl   -n kube-system logs -f pod/helm-install-traefik-g5rmk
+ export HELM_HOST=127.0.0.1:44134+ 
tiller --listen=127.0.0.1:44134 --storage=secret
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
[main] 2019/02/08 20:48:52 Starting Tiller v2.12.3 (tls=false)
[main] 2019/02/08 20:48:52 GRPC listening on 127.0.0.1:44134
[main] 2019/02/08 20:48:52 Probes listening on :44135
[main] 2019/02/08 20:48:52 Storage driver is Secret
[main] 2019/02/08 20:48:52 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.0.4:39333->10.43.0.10:53: i/o timeout

Verify by running a busy box

alpine:/home/alpine/k3s/dist/artifacts# ./k83s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
ash: ./k83s: not found
alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # 
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue 
    link/ether 32:03:33:52:8c:19 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.6/24 brd 10.42.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::3003:33ff:fe52:8c19/64 scope link 
       valid_lft forever preferred_lft forever
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.42.0.6
PING 10.42.0.6 (10.42.0.6): 56 data bytes
64 bytes from 10.42.0.6: seq=0 ttl=64 time=0.109 ms
64 bytes from 10.42.0.6: seq=1 ttl=64 time=0.108 ms
64 bytes from 10.42.0.6: seq=2 ttl=64 time=0.106 ms
^C
--- 10.42.0.6 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.106/0.107/0.109 ms
@ibuildthecloud
Copy link
Contributor

@liyimeng can you ensure the br_netfilter module is loaded. The agent is supposed to load this module but it seems to not always work. I'm troubleshooting that now.

@ibuildthecloud
Copy link
Contributor

@liyimeng FYI, it you are running in a container you need bind mount in /lib/modules/$(uname -r):/lib/modules/$(uname -r):ro so that modules can be loaded

@liyimeng
Copy link
Contributor Author

liyimeng commented Feb 9, 2019

I do have it loaded

alpine:/home/alpine/k3s/dist/artifacts# lsmod | grep netfilter br_netfilter 20480 0 bridge 163840 1 br_netfilter

When I do a troubleshooting like this:
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#running-commands-in-a-pod

`
/ # wget -O- hostnames
Connecting to hostnames (10.43.129.144:80)
hostnames-85bc9c579-rtr4r

  •                100% |***********************************************************************************************************************|    26  0:00:00 ETA
    

/ # nslookup hostnames
Server: 10.43.0.10
Address: 10.43.0.10:53

** server can't find hostnames.default.svc.cluster.local: NXDOMAIN

*** Can't find hostnames.svc.cluster.local: No answer
*** Can't find hostnames.cluster.local: No answer
*** Can't find hostnames.default.svc.cluster.local: No answer
*** Can't find hostnames.svc.cluster.local: No answer
*** Can't find hostnames.cluster.local: No answer

/ # nslookup hostnames.default
;; connection timed out; no servers could be reached
/ # cat /etc/hosts

Kubernetes-managed hosts file.

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.0.7 busybox
/ # arp
10-42-0-9.hostnames.default.svc.cluster.local (10.42.0.9) at 96:12:b9:85:cf:5c [ether] on eth0
10-42-0-8.hostnames.default.svc.cluster.local (10.42.0.8) at 1e:87:4b:df:77:2a [ether] on eth0

? (10.42.0.1) at 4e:b8:bd:7b:10:7b [ether] on eth0
10-42-0-5.kube-dns.kube-system.svc.cluster.local (10.42.0.5) at ba:e7:34:61:0a:bc [ether] on eth0
10-42-0-10.hostnames.default.svc.cluster.local (10.42.0.10) at 4e:2b:cf:de:e2:9e [ether] on eth0
`

how strange it is, I actually can reach the hostnames service when run wget, but nslookup failed. I guess it is something wrong on forwarding the packet from pod to service, or wise verse.

Do we have tube-proxy or ipvs to map between service and pods?

@liyimeng
Copy link
Contributor Author

liyimeng commented Feb 9, 2019

OK, I see that we use kube-proxy, at least iptables for this. String enough, nslookup work on the host!
just not inside the pods!

`
nslookup www.google.com 10.43.0.10
Server: 10.43.0.10
Address 1: 10.43.0.10

Name: www.google.com
Address 1: 216.58.207.196 arn11s04-in-f4.1e100.net
Address 2: 2a00:1450:400e:809::2004 ams15s32-in-x04.1e100.net
alpine:/home/alpine/k3s/dist/artifacts# nslookup hostnames.default.svc.cluster.local 10.43.0.10
Server: 10.43.0.10
Address 1: 10.43.0.10

nslookup: can't resolve 'hostnames.default.svc.cluster.local': Name does not resolve
`

@liyimeng
Copy link
Contributor Author

liyimeng commented Feb 9, 2019

BTW, ip forwarding is on

alpine:/home/alpine/k3s/dist/artifacts# cat /proc/sys/net/ipv4/ip_forward 1

@ibuildthecloud
Copy link
Contributor

@liyimeng is there any way I can reproduce your setup?

@liyimeng
Copy link
Contributor Author

@ibuildthecloud Here is what I have done:

  • create a KVM VM
  • install alpine linux, v3.9.0 likely.
  • install necessary software, like git and docker
  • check out latest k3s
  • build and run k3s
  • I then see helm install job never success, the log says google is not reachable
  • ./k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
  • try to make an nslookup www.google.com in the pod, it fails for me.

@aaliddell
Copy link
Contributor

Seen the same issue when installing on an existing system. When running on a clean install, there are no issues.

After some testing, the issue appears to be in having existing iptables rules that have a default DROP policy on the INPUT. After setting the input policy to ACCEPT, the issue appears to be resolved. Therefore, either a note needs to be added to the docs or k3s needs to setup its own iptables rules on the INPUT chain to insure that traffic does not hit the default policy, which is usually DROP or REJECT for security reasons.

@jose-sanchezm
Copy link

jose-sanchezm commented Mar 5, 2019

In a fresh installation over CentOS 7.5 I'm getting the same issue:

# kubectl get pods --all-namespaces
NAMESPACE     NAME                         READY   STATUS             RESTARTS   AGE
default       ds4m-0                       1/1     Running            0          25m
kube-system   coredns-7748f7f6df-6j8kh     0/1     CrashLoopBackOff   9          25m
kube-system   helm-install-traefik-bprt6   0/1     CrashLoopBackOff   9          25m
# kubectl logs coredns-7748f7f6df-6j8kh --namespace kube-system
E0305 16:36:17.678604       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0305 16:36:19.680598       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
.:53
2019-03-05T16:36:21.676Z [INFO] CoreDNS-1.3.0
2019-03-05T16:36:21.676Z [INFO] linux/amd64, go1.11.4, c8f0e94
CoreDNS-1.3.0
linux/amd64, go1.11.4, c8f0e94
2019-03-05T16:36:21.676Z [INFO] plugin/reload: Running configuration MD5 = 3ef0d797df417f2c0375a4d1531511fb
E0305 16:36:23.686625       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0305 16:36:23.690587       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
# kubectl logs helm-install-traefik-bprt6 --namespace kube-system
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
+ tiller --listen=127.0.0.1:44134 --storage=secret
[main] 2019/03/05 16:35:55 Starting Tiller v2.12.3 (tls=false)
[main] 2019/03/05 16:35:55 GRPC listening on 127.0.0.1:44134
[main] 2019/03/05 16:35:55 Probes listening on :44135
[main] 2019/03/05 16:35:55 Storage driver is Secret
[main] 2019/03/05 16:35:55 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.1.2:48204->10.43.0.10:53: read: no route to host

My firewalld configuration:

# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh
  ports: 4789/udp 6443/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

br_netfilter module is loaded:

# lsmod | grep netfilter
br_netfilter           22256  0 
bridge                151336  2 br_netfilter,ebtable_broute

Which extra rules do I have to configure to get it working?

@aaliddell
Copy link
Contributor

I've not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

@jose-sanchezm
Copy link

I've added the rule and core-dns logs way less errors (although there are still some) but helm-install-traefik continues crashing continuously with the same error. Do I need another rule for it?

@aaliddell
Copy link
Contributor

Does firewalld have a log somewhere of what packets it is blocking or a way to enable such a log? If so, look there to see what might still be getting dropped.

@briandealwis
Copy link

I'm seeing this with VMWare Photon 3.0. Adding @aaliddell's snippet to /etc/systemd/scripts/ip4save has done the trick.

@sahlex
Copy link

sahlex commented Mar 12, 2019

@briandealwis can you please point out which exact snippet you are referring to?

@briandealwis
Copy link

This comment:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

@sahlex
Copy link

sahlex commented Mar 12, 2019

I'm having the same problems.

From inside a pod, (busybox) the dns is configured as 10.43.x.x but no interface of that name is created. I start the server with

/usr/local/bin/k3s server --cluster-cidr 10.10.0.0/16

without disabling the coredns service. But my machine shows no interface with range 10.43.x.x:

cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.10.0.1  netmask 255.255.255.0  broadcast 10.10.0.255
        ether b6:05:a1:65:5e:49  txqueuelen 1000  (Ethernet)
        RX packets 1990  bytes 181178 (176.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1312  bytes 131217 (128.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.42.2.112  netmask 255.255.0.0  broadcast 10.42.255.255
        ether 00:15:5d:01:d2:21  txqueuelen 1000  (Ethernet)
        RX packets 744576  bytes 380910420 (363.2 MiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 71389  bytes 49850951 (47.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.10.0.0  netmask 255.255.255.255  broadcast 10.10.0.0
        ether 1e:41:71:d1:92:ff  txqueuelen 0  (Ethernet)
        RX packets 26  bytes 1944 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 888 (888.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Lokale Schleife)
        RX packets 632239  bytes 225761419 (215.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 632239  bytes 225761419 (215.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

@aaliddell
Copy link
Contributor

aaliddell commented Mar 12, 2019

The 10.43.0.0/16 range is the default service ClusterIP range, which isn't actually bound to any interface but is instead a 'virtual' ip that is routed by iptables to a pod backing the service: https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables

You can change the service ip range with --service-cidr: #171

@sahlex
Copy link

sahlex commented Mar 13, 2019

Thanks for your responses!

I added my cidr network to the firewall (accept rules show up there).

Still, after uninstalling k3s and reinstalling with

./install-k3s.sh --cluster-cidr=10.10.0.0/16

I still get errors related to DNS.

After the startup I tail the logs from coredns doing a kubectl logs -f coredns-7748f7f6df-6klwg -n kube-system:

[root@h20181152922 ~]# kubectl logs -f coredns-7748f7f6df-6klwg -n kube-system
.:53
2019-03-13T07:45:33.023Z [INFO] CoreDNS-1.3.0
2019-03-13T07:45:33.023Z [INFO] linux/amd64, go1.11.4, c8f0e94
CoreDNS-1.3.0
linux/amd64, go1.11.4, c8f0e94
2019-03-13T07:45:33.023Z [INFO] plugin/reload: Running configuration MD5 = 3ef0d797df417f2c0375a4d1531511fb
2019-03-13T07:45:54.026Z [ERROR] plugin/errors: 2 7066956279340311327.7933564162591032888. HINFO: unreachable backend: read udp 10.10.0.20:47160->1.1.1.1:53: i/o timeout
2019-03-13T07:45:57.025Z [ERROR] plugin/errors: 2 7066956279340311327.7933564162591032888. HINFO: unreachable backend: read udp 10.10.0.20:46236->1.1.1.1:53: i/o timeout

From the server logs:

Mar 13 08:50:47 docker2 k3s: time="2019-03-13T08:50:47.568418341+01:00" level=info msg="Running kubelet --healthz-bind-address 127.0.0.1 --read-only-port 0 --allow-privileged=true --cluster-domain cluster.local --kubeconfig /var/lib/rancher/k3s/agent/kubeconfig.yaml --eviction-hard imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --cgroup-driver cgroupfs --root-dir /var/lib/rancher/k3s/agent/kubelet --cert-dir /var/lib/rancher/k3s/agent/kubelet/pki --seccomp-profile-root /var/lib/rancher/k3s/agent/kubelet/seccomp --cni-conf-dir /var/lib/rancher/k3s/agent/etc/cni/net.d --cni-bin-dir /var/lib/rancher/k3s/data/e44f7a46cadac4cec9a759756f2a27fdb25e705a83d8d563207c6a6c5fa368b4/bin --cluster-dns 10.43.0.10 --container-runtime remote --container-runtime-endpoint unix:///run/k3s/containerd/containerd.sock --address 127.0.0.1 --anonymous-auth=false --client-ca-file /var/lib/rancher/k3s/agent/client-ca.pem --hostname-override h20181152922 --cpu-cfs-quota=false --runtime-cgroups /systemd/system.slice --kubelet-cgroups /systemd/system.slice"

WhenI try to do a nslookup from buysbox:

[root@h20181152922 k3s]# /usr/local/bin/k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup www.google.de
;; connection timed out; no servers could be reached

/ # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local haba.int
nameserver 10.43.0.10
options ndots:5
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

So it seems DNS is not working properly...

@aaliddell
Copy link
Contributor

Pings to 10.43.0.0/16 addresses aren't going to respond, due to them being 'virtual' and only really existing within iptables. If your DNS requests are getting to the CoreDNS pod, then cluster networking looks like it's working. Your issue may be related to #53 (how on earth did a DNS problem get issue number 53...)

@sahlex
Copy link

sahlex commented Mar 13, 2019

In fact the coredns isn't able to reach out to 1.1.1.1:53. I changed it according to #53 and now its working!!

Thanks!

BTW: you're right on the issue number. What a nice coincidence!

@odensc
Copy link

odensc commented Mar 17, 2019

To anyone running into this issue on Fedora, the proper command to add the iptables rule is:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

and then a firewall-cmd --reload fixed it for me.

Still having issues with DNS resolving though.

@xykonur
Copy link

xykonur commented Mar 22, 2019

Fedora 29 these fixed both CoreDNS and Traefik install for me:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

Might be possible to further narrow down or optimise the /15.

@gdhgdhgdh
Copy link

Fedora 29 these fixed both CoreDNS and Traefik install for me:

Perfect timing! This worked like a charm for me on CentOS 7. 🍻

@Lunik
Copy link

Lunik commented Mar 30, 2019

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm.
When adding a new node :

  • it show up with kubectl get nodes
  • when running new pods, they start properly on this node

I'm playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh
When codeDNS and busybox pods are not on the same host, they can't talk.
But when they are on the same node, they can...

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them.
k3s cluster launched with those start commands :
server : /usr/local/bin/k3s server --docker
node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

@deniseschannon
Copy link

This issue topic is very broad and each person's setup is different and unique. I'd like to close the original issue and if you are still having networking issues, can you open a new issue.

Ideally the subject is something that indicates what OS you are using, what version, and something specific about how the networking is broken.

Thanks for understanding!

@Id2ndR
Copy link

Id2ndR commented May 3, 2019

Fedora 29 these fixed both CoreDNS and Traefik install for me:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

Might be possible to further narrow down or optimise the /15.

The narrow solution whould be sudo iptables -A KUBE-FORWARD -s 10.42.0.0/16 -d 10.42.0.0/16 -m state --state NEW -j ACCEPT

However, KUBE-FORWARD table is updated quiclky, so previous command will work one time if you are quick enough. So you can use sudo firewall-cmd --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/16 -d 10.42.0.0/16 -m state --state NEW -j ACCEPT

@Harguer
Copy link

Harguer commented May 17, 2019

I had same error, and i spent some time to resolve it, even i reinstalled my OS, and tried with different kubes' versions. At the end the issue was firewalld.
I disabled and tried again with a fresh installation, and now it is working fine.

@liyimeng
Copy link
Contributor Author

liyimeng commented Jul 4, 2019

@deniseschannon it seems the same issue on k3os.

@adi90x
Copy link

adi90x commented Jul 25, 2019

I've not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

Is it correct for ufw or am I the other way around :

sudo ufw allow in on cni0 from 10.42.0.0/16 comment "K3s rule : https://github.com/rancher/k3s/issues/24#issuecomment-469759329"

@matthewygf
Copy link

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm.
When adding a new node :

  • it show up with kubectl get nodes
  • when running new pods, they start properly on this node

I'm playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh
When codeDNS and busybox pods are not on the same host, they can't talk.
But when they are on the same node, they can...

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them.
k3s cluster launched with those start commands :
server : /usr/local/bin/k3s server --docker
node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

@Lunik did you end up finding a solution for this ?? i am having the same problem.

@devopswise
Copy link

devopswise commented Jan 15, 2020

In case if you are having a similar issue, I notice there were rules related to docker in my chains. (I was using containerd). The steps I have followed:

  1. Stop cluster. (systemctl stop k3s on master, systemctl stop k3s-agent on agents)
  2. Delete all iptables rules in your chains like here: https://serverfault.com/a/200658/455081
  3. Start cluster again.

@kbrowder
Copy link

kbrowder commented Apr 15, 2020

On fedora 31 I found the simplest thing to do was:

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --reload

(edit: fix random space in -i)

@jbutler992
Copy link

@kbrowder would you mind reviewing that command? Im on Fedora31 and seeing this issue but when I try your command it says its not a valid ipv4 filter command.

@maci0
Copy link

maci0 commented May 24, 2020

@kbrowder would you mind reviewing that command? Im on Fedora31 and seeing this issue but when I try your command it says its not a valid ipv4 filter command.

there was just a typo in his command it says - i cni0 instead of -i cni0

besides that it works for me on centos8

@kbrowder
Copy link

@maci0, woops, you're right, I edited my response above, sorry for the delay @jbutler992

@adacaccia
Copy link

adacaccia commented Jun 15, 2020

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

@robodude666
Copy link

This is an old thread, but I still want to share this to potentially save someone days of frustration.

I had a private network of 10.42.42.0/24 and could not figure out why k3s was not working. Using non-default cluster/service CIDRs fixed networking issues for me.

@elsbrock
Copy link

elsbrock commented Nov 9, 2020

This is an old thread, but I still want to share this to potentially save someone days of frustration.

I had a private network of 10.42.42.0/24 and could not figure out why k3s was not working. Using non-default cluster/service CIDRs fixed networking issues for me.

Thank you, kind sir. That hint saved me a ton of time!

@cakiem8x
Copy link

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

Thanks you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests