RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod #1541

aiyengar2 · 2021-08-05T20:59:38Z

Environmental Info:
RKE2 Version: v1.21.3-rc3+rke2r2

Node(s) CPU architecture, OS, and Version:

Linux arvind-rke2-1 5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 3 server nodes. Also reproducible on 3 etcd, 1 controlplane, and 3 worker nodes

Describe the bug:

Steps To Reproduce:

Installed RKE2: curl https://get.rke2.io | INSTALL_RKE2_CHANNEL=testing INSTALL_RKE2_METHOD=tar INSTALL_RKE2_VERSION=v1.21.3-rc3+rke2r2 sh -
Updated the config:

root@arvind-rke2-0:~# cat /etc/rancher/rke2/config.yaml
cni: calico

Other nodes are the same except with server + token fields

Run dig @10.43.0.10 google.com

Expected behavior:

All nodes should resolve the DNS

Actual behavior:

Only one node (the one that rke2-coredns is running on) resolves the DNS

Additional context / logs:

This issue was diagnosed in rancher/rancher#33052 but reproduced independently of Rancher using the above steps.

The text was updated successfully, but these errors were encountered:

aiyengar2 · 2021-08-05T21:01:11Z

As noted in rancher/rancher#33052 (comment), it seems like this was a regression that was broken, fixed, and then broken once more, possibly due to different versions of RKE2

Oats87 · 2021-08-05T21:01:46Z

I debugged this with Arvind and we found interesting behavior where UDP DNS queries are unable to be resolved when transiting via the service IP for CoreDNS i.e. 10.43.0.10. If we directly addressed the coredns pod, we could make our DNS queries with no issue. It did not matter whether we were in or out of a pod, i.e. on the node or not.

The DNS service IP 10.43.0.10 worked when CoreDNS was located on the same node as we were testing on.

This is when using the Calico CNI.

This only occurs on Ubuntu 20.04 in our testing. On my CentOS 7 testing boxes, we did not run into this issue.

Oats87 · 2021-08-05T21:02:54Z

ufw was on but

# ufw status
Status: inactive

For good measure, systemctl disable ufw --now && reboot did not help either.

brandond · 2021-08-05T21:09:15Z

Does it make any difference if you switch the host iptables between legacy/nftables or uninstall the host iptables+nftables so that we use the embedded ones?

aiyengar2 · 2021-08-05T21:22:46Z

Does it make any difference if you switch the host iptables between legacy/nftables or uninstall the host iptables+nftables so that we use the embedded ones?

Not sure about this. cc: @Oats87

However, I was able to test this on a v1.21.2+rke2r1 cluster and verify that this issue still exists in that version, so #1541 (comment) is not accurate.

aiyengar2 · 2021-08-05T21:33:32Z

In a v1.21.3-rc3+rke2r2 cluster with two Ubuntu 18.04 nodes (as opposed to 20.04 listed above), I was able to reproduce this same behavior on the nodes.

$ uname -a
Linux arvind-ubuntu-1804-0 4.15.0-144-generic #148-Ubuntu SMP Sat May 8 02:33:43 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

aiyengar2 · 2021-08-05T21:45:20Z

Without specifying cni: calico in the RKE2 cluster (v1.21.3-rc3+rke2r2), the dig call worked perfectly fine on all nodes.

Seems like this is definitely related to Calico, as indicated on the ticket title.

manuelbuil · 2021-08-06T17:38:02Z

Editing comment. Things seem to work when the pod is the client. The problem comes when the host tries to access the service. This is also happening on v1.21.3+rke2r1

manuelbuil · 2021-08-06T19:01:04Z

When tracking the packet, I see it going through the correct iptables of kube-proxy:

-A KUBE-SERVICES -d 10.43.0.10/32 -p udp -m comment --comment "kube-system/rke2-coredns-rke2-coredns:udp-53 cluster IP" -m udp --dport 53 -j KUBE-SVC-YFPH5LFNKP7E3G4L

-A KUBE-SVC-YFPH5LFNKP7E3G4L -m comment --comment "kube-system/rke2-coredns-rke2-coredns:udp-53" -j KUBE-SEP-F54GWJZTXPXAPHRS

-A KUBE-SEP-F54GWJZTXPXAPHRS -p udp -m comment --comment "kube-system/rke2-coredns-rke2-coredns:udp-53" -m udp -j DNAT --to-destination 10.42.182.4:53

I can see the packet leaving the node:

18:48:21.980669 IP 10.0.10.14.24169 > 10.0.10.10.4789: VXLAN, flags [I] (0x08), vni 4096
IP 10.42.222.64.60066 > 10.42.182.4.53: 61063+ [1au] A? google.com. (51)

And I can see the packet reaching to the other node (the one where coredns is):

IP 10.42.222.64.55933 > 10.42.182.4.53: 5083+ [1au] A? google.com. (51)
18:49:56.672731 IP 10.0.10.14.58192 > 10.0.10.10.4789: VXLAN, flags [I] (0x08), vni 4096

Then the packet disappears

manuelbuil · 2021-08-06T19:10:49Z

Sniffing a packet targeting the service. Node with coredns, interface eth0:

19:04:16.499057 IP 10.0.10.14.49959 > 10.0.10.10.4789: VXLAN, flags [I] (0x08), vni 4096
IP 10.42.222.64.51417 > 10.42.182.4.53: 29795+ [1au] A? google.com. (51)

Sniffing a packet targeting the pod implementing the service. Node with coredns, interface eth0:

19:04:11.194801 IP 10.0.10.14.34353 > 10.0.10.10.4789: VXLAN, flags [I] (0x08), vni 4096
IP 10.42.222.64.46410 > 10.42.182.4.53: 14722+ [1au] A? google.com. (51)
19:04:11.195126 IP 10.0.10.10.51238 > 10.0.10.14.4789: VXLAN, flags [I] (0x08), vni 4096
IP 10.42.182.4.53 > 10.42.222.64.46410: 14722* 1/0/1 A 142.250.178.142 (77)

Sniffing a packet targeting the service. Node with coredns, interface vxlan.calico: nothing
Sniffing a packet targeting the pod implementing the service. Node with coredns, interface vxlan.calico:

19:03:32.327538 IP 10.42.222.64.44687 > 10.42.182.4.53: 19939+ [1au] A? google.com. (51)
19:03:32.328436 IP 10.42.182.4.53 > 10.42.222.64.44687: 19939 1/0/1 A 142.250.178.142 (65)

manuelbuil · 2021-08-09T10:38:09Z

After looking at different things I noticed that when accessing the service directly to the pod, we see this:

06:29:06:4b:55:9a > 06:47:af:1e:bf:ca, ethertype IPv4 (0x0800), length 143: (tos 0x0, ttl 64, id 21070, offset 0, flags [none], proto UDP (17), length 129)
    10.0.10.14.50078 > 10.0.10.10.4789: [udp sum ok] VXLAN, flags [I] (0x08), vni 4096

But if we access the service via the clusterIP, we see this:

06:29:06:4b:55:9a > 06:47:af:1e:bf:ca, ethertype IPv4 (0x0800), length 143: (tos 0x0, ttl 64, id 16380, offset 0, flags [none], proto UDP (17), length 129)
    10.0.10.14.7103 > 10.0.10.10.4789: [bad udp cksum 0xbd67 -> 0x4e7f!] VXLAN, flags [I] (0x08), vni 4096

Note the bad udp cksum.

After investigating a bit, I read that this is a known kernel bug that was fixed in 5.7. Apparently, the kernel driver miscalculates the checksum when the vxlan offloading is on if the packet is natted, which is our case when accessing the service via the ClusterIP. Centos and RHEL 8 have backported the fix but not Ubuntu, that's why we only see it in Ubuntu (note that Ubuntu 20 uses 5.4.0). This is the kernel fix: torvalds/linux@ea64d8d.

Manual fix:
Disable the vxlan offloading in the vxlan interface for all nodes: sudo ethtool -K vxlan.calico tx-checksum-ip-generic off. I tested and it works :).

Calico's recommended fix:
Calico includes a env variable that when passed to the agent, disables the feature that creates this problem (MASQFullyRandom): projectcalico/calico#3145 (comment). Needs to be tested

TO DO:

Follow the Calico recommendation and disable the MASQFullyRandom feature. Check that the problem does not appear
Fully understand what are the disadvantages of disabling MASQFullyRandom compared to disabling the vxlan offloading
Check if SUSE backported this fix to 15 sp3
Find a way to limit the impact of this by only disabling MASQFullyRandom or vxlan offloading in Ubuntu systems (and suse?)

manuelbuil · 2021-08-09T14:15:06Z

Disabling MASQFullyRandom feature does not help. Asking Tigera, perhaps something else must be change. Note that there is a recent PR to fix this on Calico but it does not seem enabled in our version ==> projectcalico/felix#2811

manuelbuil · 2021-08-09T14:59:48Z

Same issue in opensuse SP3:

10.0.10.9.26831 > 10.0.10.7.4789: [bad udp cksum 0x69a5 -> 0xc8d6!] VXLAN, flags [I] (0x08), vni 4096

Fixed after running sudo ethtool -K vxlan.calico tx-checksum-ip-generic off

vadorovsky · 2021-08-10T10:14:27Z

@manuelbuil Are you sure that the kernel commit you linked is the only one?

It's applied in SLE 15 SP3 / Leap 15.3 already:
SUSE/kernel@3dc74ef

and you seem to have issues on SLE/openSUSE anyway.

manuelbuil · 2021-08-10T10:49:11Z

@manuelbuil Are you sure that the kernel commit you linked is the only one?

It's applied in SLE 15 SP3 / Leap 15.3 already:
SUSE/kernel@3dc74ef

and you seem to have issues on SLE/openSUSE anyway.

I got the link from projectcalico/calico#3145 (comment).

I reported some issues in openSUSE but they were related to a dirty env. Once I freshly deployed, I was able to see the same problem as in Ubuntu

This fixes rancher/rke2#1541 even for kernel version > 5.7 Signed-off-by: Manuel Buil <mbuil@suse.com>

rancher-max · 2021-08-23T15:27:00Z

Reopening for testing in rke2

manuelbuil · 2021-08-23T16:56:05Z

@rancher-max apart from doing the dig @10.43.0.10 www.google.com in all nodes, verify that kubectl get felixconfigurations.crd.projectcalico.org default -o yaml gives you this spec:

spec:
  bpfLogLevel: ""
  featureDetectOverride: ChecksumOffloadBroken=true
  logSeverityScreen: Info
  reportingInterval: 0s
  vxlanEnabled: true

We are only passing featureDetectOverride: ChecksumOffloadBroken=true and the rest of parameters should be filled by the operator

rancher-max · 2021-08-23T20:43:15Z

Leaving this open to validate on 1.22 release line, but confirmed working in v1.21.3-rc7+rke2r2

Validated the dig command works on all nodes, and felixconfigurations are set as mentioned above. Also confirmed running sudo ethtool -k vxlan.calico | grep tx-checksum-ip-generic on all nodes returns expected tx-checksum-ip-generic: off.

galal-hussein · 2021-09-22T22:37:42Z

Validated on master commit 09bb5c2

Install rke2 server on three nodes
configure rke2 server to run calico as its cni
run the dig command on all nodes

# dig @10.43.0.10 google.com

; <<>> DiG 9.16.1-Ubuntu <<>> @10.43.0.10 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5294
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 6c2344a672e1a9de (echoed)
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		30	IN	A	142.250.69.206

;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Wed Sep 22 22:35:47 UTC 2021
;; MSG SIZE  rcvd: 77

make sure that calico is configured correctly

# kubectl get felixconfigurations.crd.projectcalico.org default -o yaml
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
  annotations:
    meta.helm.sh/release-name: rke2-calico
    meta.helm.sh/release-namespace: kube-system
    projectcalico.org/metadata: '{"uid":"9f051b21-813b-475b-9615-c23692d89279","generation":1,"creationTimestamp":"2021-09-22T22:32:15Z","managedFields":[{"manager":"helm","operation":"Update","apiVersion":"crd.projectcalico.org/v1","time":"2021-09-22T22:32:15Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{}}},"f:spec":{".":{},"f:featureDetectOverride":{}}}}]}'
  creationTimestamp: "2021-09-22T22:32:15Z"
  generation: 3
  labels:
    app.kubernetes.io/managed-by: Helm
  name: default
  resourceVersion: "968"
  uid: 9f051b21-813b-475b-9615-c23692d89279
spec:
  bpfLogLevel: ""
  featureDetectOverride: ChecksumOffloadBroken=true
  logSeverityScreen: Info
  reportingInterval: 0s
  vxlanEnabled: true

strelok899 · 2024-02-11T17:41:48Z

i have the opposite issue.
the fix gone to the helm and i trying to disable it to have the hardware offload as i have limited resources and my kube-proxy crashing on lack of cpu issues.

how can i make the offloading work?

brandond · 2024-02-11T20:24:39Z

Enabling hardware offload will not address issues with insufficient CPU resources. Also, please don't revive old resolved issues to ask unrelated questions, open a new issue or discussion.

aiyengar2 mentioned this issue Aug 5, 2021

[Monitoring V2] multi-node cluster only registers 1 node for kube-etcd, kube-proxy -> many targets down in prometheus for RKE2 cluster using calico rancher/rancher#33052

Closed

aiyengar2 changed the title ~~RKE2 cluster seems to have DNS issues~~ RKE2 cluster seems to have issues with pods communicating across nodes Aug 5, 2021

Oats87 changed the title ~~RKE2 cluster seems to have issues with pods communicating across nodes~~ RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod Aug 5, 2021

cwayne18 assigned manuelbuil Aug 9, 2021

cwayne18 added area/cni by-2.6-TP kind/bug Something isn't working labels Aug 9, 2021

vadorovsky self-assigned this Aug 20, 2021

manuelbuil added a commit to manuelbuil/rke2-charts that referenced this issue Aug 23, 2021

Pass featureDetectOverride

dd10151

This fixes rancher/rke2#1541 even for kernel version > 5.7 Signed-off-by: Manuel Buil <mbuil@suse.com>

manuelbuil mentioned this issue Aug 23, 2021

Pass featureDetectOverride rancher/rke2-charts#136

Merged

vadorovsky closed this as completed in rancher/rke2-charts@2acb4d1 Aug 23, 2021

rancher-max reopened this Aug 23, 2021

This was referenced Aug 23, 2021

Update calico chart #1692

Merged

[Release 1.21] RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod #1693

Closed

manuelbuil mentioned this issue Aug 23, 2021

Update Calico chart version rancher/kontainer-driver-metadata#653

Merged

fapatel1 modified the milestones: v1.22.0+rke2r1, v1.22.2+rke2r1 Aug 23, 2021

manuelbuil mentioned this issue Aug 24, 2021

bad udp cksum when using vxlan projectcalico/calico#4865

Closed

galal-hussein closed this as completed Sep 22, 2021

manuelbuil mentioned this issue Jan 31, 2022

k3s on rhel 8 network/dns probleme and metrics not work k3s-io/k3s#5013

Closed

maxpain mentioned this issue Jun 16, 2022

VXLAN: bad UDP checksums kubernetes-sigs/kubespray#8992

Closed

brandond mentioned this issue Dec 15, 2022

[BUG] CIS scan on k3s clusters running for too long before it gets completed. rancher/rancher#39839

Closed

ejweber mentioned this issue Aug 21, 2023

[BUG] Failed Statefulset Pod Creation with RWX Workload on Longhorn v1.3.3 and SLES 15 SP5 longhorn/longhorn#6494

Closed

brownz11 mentioned this issue Sep 13, 2023

The pods on Kubernetes cannot access the pod IP addresses on other's nodes. projectcalico/calico#7991

Closed

manuelbuil mentioned this issue Jan 3, 2024

Cannot access application running on a pod of another Node #5164

Closed

rancher locked and limited conversation to collaborators Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod #1541

RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod #1541

aiyengar2 commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

Oats87 commented Aug 5, 2021

Oats87 commented Aug 5, 2021

brandond commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

manuelbuil commented Aug 6, 2021 •

edited

Loading

manuelbuil commented Aug 6, 2021 •

edited

Loading

manuelbuil commented Aug 6, 2021

manuelbuil commented Aug 9, 2021 •

edited

Loading

manuelbuil commented Aug 9, 2021

manuelbuil commented Aug 9, 2021 •

edited

Loading

vadorovsky commented Aug 10, 2021

manuelbuil commented Aug 10, 2021

rancher-max commented Aug 23, 2021

manuelbuil commented Aug 23, 2021

rancher-max commented Aug 23, 2021

galal-hussein commented Sep 22, 2021

strelok899 commented Feb 11, 2024

brandond commented Feb 11, 2024

RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod #1541

RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod #1541

Comments

aiyengar2 commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

Oats87 commented Aug 5, 2021

Oats87 commented Aug 5, 2021

brandond commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

aiyengar2 commented Aug 5, 2021

manuelbuil commented Aug 6, 2021 • edited Loading

manuelbuil commented Aug 6, 2021 • edited Loading

manuelbuil commented Aug 6, 2021

manuelbuil commented Aug 9, 2021 • edited Loading

manuelbuil commented Aug 9, 2021

manuelbuil commented Aug 9, 2021 • edited Loading

vadorovsky commented Aug 10, 2021

manuelbuil commented Aug 10, 2021

rancher-max commented Aug 23, 2021

manuelbuil commented Aug 23, 2021

rancher-max commented Aug 23, 2021

galal-hussein commented Sep 22, 2021

strelok899 commented Feb 11, 2024

brandond commented Feb 11, 2024

manuelbuil commented Aug 6, 2021 •

edited

Loading

manuelbuil commented Aug 6, 2021 •

edited

Loading

manuelbuil commented Aug 9, 2021 •

edited

Loading

manuelbuil commented Aug 9, 2021 •

edited

Loading