Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMTU is wrong when using GRE+IPsec with some Linux kernel versions #5922

Closed
tnqn opened this issue Jan 25, 2024 · 0 comments
Closed

PMTU is wrong when using GRE+IPsec with some Linux kernel versions #5922

tnqn opened this issue Jan 25, 2024 · 0 comments
Labels
area/transit/encapsulation Issues or PRs related to encapsulation. area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). kind/bug Categorizes issue or PR as related to a bug.

Comments

@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

Describe the bug

While testing #5880 on a non-kind testbed, ping requests of maximum MTU were always dropped like below. The error shows it received responses indicating the path MTU is 1284, instead of 1362.

=== RUN   TestIPSec/testIPSecPSKAuth/testIPSecTunnelConnectivity
    ipsec_test.go:139: Executing ping tests across Nodes: 'jenkins-antrea-e2e-for-pull-request-2600-md-0-58bcf48f8xzxmnqgp' <-> 'jenkins-antrea-e2e-for-pull-request-2600-sfnlq'
    connectivity_test.go:76: Waiting for Pods to be ready and retrieving IPs
    connectivity_test.go:90: Retrieved all Pod IPs: map[connectivity-testipsec-psk-p5d8h:IPv4(192.168.1.5),IPstrings(192.168.1.5) connectivity-testipsec-psk-q7tsb:IPv4(192.168.0.2),IPstrings(192.168.0.2)]
    connectivity_test.go:99: Ping mesh test between all Pods
    connectivity_test.go:114: Ping 'testipsec-yson617z/connectivity-testipsec-psk-p5d8h' -> 'testipsec-yson617z/connectivity-testipsec-psk-q7tsb': ERROR (error when running ping command 'ping -c 5 -s 1334 -M do -4 192.168.0.2': command terminated with exit code 1 - stdout: PING 192.168.0.2 (192.168.0.2) 1334(1362) bytes of data.
        
        --- 192.168.0.2 ping statistics ---
        5 packets transmitted, 0 received, 100% packet loss, time 4079ms
        
         - stderr: )
    connectivity_test.go:114: Ping 'testipsec-yson617z/connectivity-testipsec-psk-q7tsb' -> 'testipsec-yson617z/connectivity-testipsec-psk-p5d8h': ERROR (error when running ping command 'ping -c 5 -s 1334 -M do -4 192.168.1.5': command terminated with exit code 1 - stdout: PING 192.168.1.5 (192.168.1.5) 1334(1362) bytes of data.
        From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 1284)
        
        --- 192.168.1.5 ping statistics ---
        5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4103ms
        
         - stderr: ping: local error: message too long, mtu=1284
        ping: local error: message too long, mtu=1284
        ping: local error: message too long, mtu=1284
        ping: local error: message too long, mtu=1284
        )
    ipsec_test.go:151: Found 1 'up' SecurityAssociation(s) for Node 'jenkins-antrea-e2e-for-pull-request-2600-md-0-58bcf48f8xzxmnqgp', certificate auth: false

However, all interfaces' MTU in the datapath are configured to 1362 correctly:

# In the root namespace
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc prio state UP group default qlen 1000
    link/ether 00:50:56:b1:14:04 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.186/24 brd 172.19.0.255 scope global dynamic eth0
       valid_lft 3675sec preferred_lft 3675sec
    inet 172.19.0.66/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:feb1:1404/64 scope link
       valid_lft forever preferred_lft forever
3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 46:d1:29:f5:a1:8d brd ff:ff:ff:ff:ff:ff
5: antrea-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1362 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 82:1c:06:3f:92:30 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/24 brd 192.168.0.255 scope global antrea-gw0
       valid_lft forever preferred_lft forever
    inet6 fe80::801c:6ff:fe3f:9230/64 scope link
       valid_lft forever preferred_lft forever
6: antrea-egress0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether aa:cf:40:9c:85:9b brd ff:ff:ff:ff:ff:ff
7: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
8: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
9: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
10: gre_sys@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000
    link/ether 7e:83:ca:76:00:52 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7c83:caff:fe76:52/64 scope link
       valid_lft forever preferred_lft forever
11: connecti-0c7f0e@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1362 qdisc noqueue master ovs-system state UP group default
    link/ether 52:f7:78:24:ed:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::50f7:78ff:fe24:eda1/64 scope link
       valid_lft forever preferred_lft forever

# In the Pod namespace
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
3: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
4: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
5: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1362 qdisc noqueue state UP group default
    link/ether 1e:72:06:36:a2:d6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.2/24 brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::1c72:6ff:fe36:a2d6/64 scope link
       valid_lft forever preferred_lft forever

Checking the route cache in the Pod namespace, there is a cache stating the PMTU is 1284

# ip route show cache
192.168.1.5 via 192.168.0.1 dev eth0
    cache expires 167sec mtu 1284

There is a ICMP unreachable reponse can be captured on antrea-gw0.

# tcpdump -i antrea-gw0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on antrea-gw0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:14:22.552160 IP 192.168.0.1 > 192.168.0.2: ICMP 192.168.1.5 unreachable - need to frag (mtu 1284), length 556

The Node's kernel version is 5.4.0-156-generic #173-Ubuntu.
I can't reproduce the issue on a kind cluster with kernel version 5.4.0-169-generic #187-Ubuntu.
I suspect there was a PMTU calculation bug fixed between the two versions but haven't found which commit is responsible.
Searching "mtu" in http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.4.0-169.187/changelog, there were quite some commits related to mtu fixes.

I may not continue investigation for the root cause given it works fine on newer kernels. I created the issue for tracking and knowledge sharing. I will have to not validate connectvitity using ping with maximum MTU in TestIPSec to work around.

@tnqn tnqn added area/transit/encapsulation Issues or PRs related to encapsulation. area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). kind/bug Categorizes issue or PR as related to a bug. labels Jan 25, 2024
@tnqn tnqn closed this as completed Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transit/encapsulation Issues or PRs related to encapsulation. area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant