Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect MTU configurations #5880

Merged
merged 2 commits into from
Jan 26, 2024
Merged

Fix incorrect MTU configurations #5880

merged 2 commits into from
Jan 26, 2024

Conversation

hjiajing
Copy link
Contributor

@hjiajing hjiajing commented Jan 15, 2024

The commit fixes 3 incorrect MTU configurations:

  1. When using the WireGuard encryption mode, the Pod eth0's MTU was not correct. The MTU deducted Geneve overhead because the default tunnel type is Geneve while it should deduct the WireGuard overhead as traffic will be encrypted instead of encapsulated.
  2. When using the GRE tunnel type, the Pod eth0's MTU was not correct. The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header (4 standard header + 4 key field), summing up to 42 bytes.
  3. When enabling Wireguard for Multicluster, the MTU of all Pod interfaces and wireguard interface were reduced 130 bytes (50 for geneve + 80 for wireguard), however, cross-cluster traffic sent from Pods were not forwarded by wireguard interface. This is because traffic originated from Pods will be encapsulated on gateway Node, and it's the encapsulated packet which will be encrypted. If the wireguard interface is set with the same MTU as the Pod interface, the encapsulated packet will exceed wireguard interface's MTU.

Fixes #5868
Fixes #5913
Fixes #5914

@luolanzone luolanzone added this to the Antrea v1.15 release milestone Jan 16, 2024
@hjiajing hjiajing linked an issue Jan 16, 2024 that may be closed by this pull request
@luolanzone
Copy link
Contributor

@hjiajing the sign-off info is missing. DCO check failed.

@luolanzone luolanzone added the action/backport Indicates a PR that requires backports. label Jan 17, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please check the usage of the result of the calculation globally and not cause new problems.
  2. If possible, we could add a validation to existing wireguard e2e test to check packet of MTU size not get dropped.
  3. Link to the original issue

Comment on lines 271 to 274
if nc.TrafficEncryptionMode == TrafficEncryptionModeWireGuard {
mtuDeduction = WireGuardOverhead
} else if nc.TrafficEncryptionMode == TrafficEncryptionModeIPSec {
mtuDeduction = IPSecESPOverhead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling will cause some other issues:

  1. The comment of NetworkConfig.MTUDeduction says it doesn't count IPsec and WireGuard overhead.
MTUDeduction only counts IPv4 tunnel overhead, no IPsec and WireGuard overhead.
  1. mc_route_controller.go will deduct the overhead another time:
MTU:  controller.nodeConfig.NodeTransportInterfaceMTU - controller.networkConfig.MTUDeduction - config.WireGuardOverhead,
  1. getInterfaceMTU will deduct the overhead another time:
if i.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeIPSec {
	mtu -= config.IPSecESPOverhead
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, I removed the return value of CalculateMTUDeduction, when we need to calculate MTUs, the caller just uses networkConfig.MTUDeduction directly. And when the TrafficEncryptionMode is WireGuard, it will deduct the WG overhead.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If possible, we could add a validation to existing wireguard e2e test to check packet of MTU size not get dropped.
  2. Link to the original issue

We need both unit tests and e2e tests to prove the fix works as expected and avoid another regression in the future.

Comment on lines 1197 to 1203
if i.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeWireGuard {
mtu -= config.WireGuardOverhead
} else {
mtu -= i.networkConfig.MTUDeduction
if i.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeIPSec {
mtu -= config.IPSecESPOverhead
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's still a bit unconsolidated to spread the deduction logic in several places, which is error prone and caused the previous bug. Could we take all cases into consideration in CalculateMTUDeduction and use the result everywhere?
By all cases, I mean wireguard only, wireguard + encap (multi-cluster case), ipsec, overlay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will refactor this part to one MTU calculator for all scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new variable "EncryptionDeduction" in NetworkConfig, and the MTU will deducts TunnelDeduction and EncryptionDeduction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just calculate a single deduction value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TunnelDeduction is also used in calculating multi-cluster Pod MTU, so I divided the deduction into two parts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @hjiajing, he will change to a single deduction value.

@hjiajing hjiajing force-pushed the fix-wg-mtu branch 3 times, most recently from 9aff2f1 to b2c145b Compare January 21, 2024 10:05
@hjiajing
Copy link
Contributor Author

/test-e2e

1 similar comment
@hjiajing
Copy link
Contributor Author

/test-e2e

@hjiajing
Copy link
Contributor Author

/test-e2e

// When Multi-cluster Gateway is enabled, we need to reduce MTU for potential cross-cluster traffic.
if nc.TrafficEncapMode.SupportsEncap() || nc.EnableMulticlusterGW {
if nc.TunnelType == ovsconfig.VXLANTunnel {
mtuDeduction = vxlanOverhead
nc.MTUDeduction = vxlanOverhead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPsec work together with those encap tunnel modes, I think this should be "nc.MTUDeduction += vxlanOverhead". cc @xliuxu

I have a question regarding the IPSec with Encap mode and Multicluster enabled, do we verified this kind of traffic before for MC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @luolanzone for the IPsec case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. changed = to +=.

// Example stdout:
// 22: vxlan-29acf3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
// link/ether be:41:93:69:87:02 brd ff:ff:ff:ff:ff:ff link-netns cni-320ae61f-5e51-d123-0169-9f1807390500
fields := strings.Fields(strings.Split(strings.TrimSpace(stdout), "\n")[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simper to get the MTU from the system file directly: cat /sys/class/net/<interface_name>/mtu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

@hjiajing hjiajing force-pushed the fix-wg-mtu branch 2 times, most recently from c2c3d32 to 0cbb17f Compare January 22, 2024 07:15
@hjiajing
Copy link
Contributor Author

/test-e2e

pkg/agent/config/node_config_test.go Show resolved Hide resolved
@@ -129,7 +129,7 @@ func NewMCDefaultRouteController(
controller.wireGuardConfig = &config.WireGuardConfig{
Port: multiclusterConfig.WireGuard.Port,
Name: multiclusterWireGuardInterface,
MTU: controller.nodeConfig.NodeTransportInterfaceMTU - controller.networkConfig.MTUDeduction - config.WireGuardOverhead,
MTU: controller.nodeConfig.NodeTransportInterfaceMTU - controller.networkConfig.MTUDeduction,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it just controller.networkConfig.InterfaceMTU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -2481,6 +2481,47 @@ func (data *TestData) GetTransportInterface() (string, error) {
return "", fmt.Errorf("no interface was assigned with Node IP %s", nodeIP)
}

func (data *TestData) GetPodInterfaceName(nodeName string, podName string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean checking MTU in e2e. The calculation has already been validated in unit tests, the mistake of calculation can be caught by unit test. What we really is to ensure the calculation can work by testing real traffic, which was not there before.

My suggestion of how to write the e2e:
Just change the existing connectivity test to test the MTU size packet can be forwarded. The test should fail without the fix and succeed with the fix.
For example, in runPingMesh, change the packet size to the MTU-28 and specify -M do to set Don't Frangment bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E2E test added.

@@ -205,7 +205,7 @@ func run(o *Options) error {
IPsecConfig: config.IPsecConfig{
AuthenticationMode: ipsecAuthenticationMode,
},
EnableMulticlusterGW: enableMulticlusterGW,
MulticlusterConfig: o.config.Multicluster,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do not just replace it without removing the original attribute.
  2. It's inconsistent to use o.config.Multicluster as the source of whether multicluster gateway is enabled. If you take a look at how enableMulticlusterGW is calculated, there is other condition.

I believe we should keep passing enableMulticlusterGW, and add a parsed nc.MulticlusterConfig.TrafficEncryptionMode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I removed this line by mistake, added back.

@hjiajing hjiajing force-pushed the fix-wg-mtu branch 2 times, most recently from d8ce863 to c8bc079 Compare January 22, 2024 17:38
@hjiajing
Copy link
Contributor Author

/test-e2e

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check golang-lint failures.

pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
test/e2e/connectivity_test.go Outdated Show resolved Hide resolved
test/e2e/connectivity_test.go Outdated Show resolved Hide resolved
test/e2e/connectivity_test.go Outdated Show resolved Hide resolved
test/e2e/mtu_test.go Outdated Show resolved Hide resolved
@hjiajing
Copy link
Contributor Author

/test-e2e

@hjiajing hjiajing force-pushed the fix-wg-mtu branch 2 times, most recently from e751c22 to d8dbf0b Compare January 23, 2024 07:05
@tnqn
Copy link
Member

tnqn commented Jan 24, 2024

@antoninbas could you take a look at the PR?

antoninbas
antoninbas previously approved these changes Jan 24, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
if nc.TrafficEncryptionMode == TrafficEncryptionModeWireGuard {
// When WireGuard is enabled, cross-node traffic will only be encrypted, just reduce MTU for encryption.
nc.MTUDeduction = WireGuardOverhead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to do 60B when IPv6 is disabled and 80B when IPv6 is enabled?

see https://lists.zx2c4.com/pipermail/wireguard/2017-December/002201.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the info, will add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

/test-e2e

@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

/test-e2e

ci/jenkins/test-vmc.sh Outdated Show resolved Hide resolved
Comment on lines 120 to 122
if t.Failed() {
time.Sleep(15 * time.Minute)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for testing only

@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

/test-e2e

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

/test-e2e
/test-ipv6-e2e
/test-ipv6-only-e2e
/test-windows-e2e
/test-flexible-ipam-e2e
/test-multicluster-e2e
/test-multicast-e2e

@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

ipv4 e2e failed, but ipsec's MTU may be wrong from the beginning, need more investigation.

It may be related by MTU calculation bug in specific versions, created #5922

@tnqn tnqn mentioned this pull request Jan 25, 2024
@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

/test-conformance
/test-networkpolicy

@tnqn tnqn mentioned this pull request Jan 25, 2024
@tnqn
Copy link
Member

tnqn commented Jan 25, 2024

/test-windows-e2e
/test-multicluster-e2e

@tnqn
Copy link
Member

tnqn commented Jan 26, 2024

/test-multicluster-e2e
/test-windows-containerd-e2e

@hjiajing
Copy link
Contributor Author

/test-multicluster-e2e

@tnqn tnqn merged commit ce46eb1 into antrea-io:main Jan 26, 2024
53 of 58 checks passed
tnqn added a commit to tnqn/antrea that referenced this pull request Jan 26, 2024
The commit fixes 3 incorrect MTU configurations:

1. When using the WireGuard encryption mode, the Pod eth0's MTU was not
correct. The MTU deducted Geneve overhead because the default tunnel
type is Geneve while it should deduct the WireGuard overhead as traffic
will be encrypted instead of encapsulated.

2. When using the GRE tunnel type, the Pod eth0's MTU was not correct.
The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header
(4 standard header + 4 key field), summing up to 42 bytes.

3. When enabling Wireguard for Multicluster, the MTU of all Pod
interfaces and wireguard interface were reduced 130 bytes (50 for
geneve + 80 for wireguard), however, cross-cluster traffic sent from
Pods were not forwarded by wireguard interface. This is because traffic
originated from Pods will be encapsulated on gateway Node, and it's the
encapsulated packet which will be encrypted. If the wireguard interface
is set with the same MTU as the Pod interface, the encapsulated packet
will exceed wireguard interface's MTU.

Signed-off-by: Jiajing Hu <hjiajing@vmware.com>
Signed-off-by: Quan Tian <qtian@vmware.com>
Co-authored-by: Quan Tian <qtian@vmware.com>
tnqn added a commit that referenced this pull request Jan 26, 2024
The commit fixes 3 incorrect MTU configurations:

1. When using the WireGuard encryption mode, the Pod eth0's MTU was not
correct. The MTU deducted Geneve overhead because the default tunnel
type is Geneve while it should deduct the WireGuard overhead as traffic
will be encrypted instead of encapsulated.

2. When using the GRE tunnel type, the Pod eth0's MTU was not correct.
The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header
(4 standard header + 4 key field), summing up to 42 bytes.

3. When enabling Wireguard for Multicluster, the MTU of all Pod
interfaces and wireguard interface were reduced 130 bytes (50 for
geneve + 80 for wireguard), however, cross-cluster traffic sent from
Pods were not forwarded by wireguard interface. This is because traffic
originated from Pods will be encapsulated on gateway Node, and it's the
encapsulated packet which will be encrypted. If the wireguard interface
is set with the same MTU as the Pod interface, the encapsulated packet
will exceed wireguard interface's MTU.

Signed-off-by: Jiajing Hu <hjiajing@vmware.com>
Signed-off-by: Quan Tian <qtian@vmware.com>
Co-authored-by: Quan Tian <qtian@vmware.com>
tnqn added a commit to tnqn/antrea that referenced this pull request Mar 25, 2024
The commit fixes 3 incorrect MTU configurations:

1. When using the WireGuard encryption mode, the Pod eth0's MTU was not
correct. The MTU deducted Geneve overhead because the default tunnel
type is Geneve while it should deduct the WireGuard overhead as traffic
will be encrypted instead of encapsulated.

2. When using the GRE tunnel type, the Pod eth0's MTU was not correct.
The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header
(4 standard header + 4 key field), summing up to 42 bytes.

3. When enabling Wireguard for Multicluster, the MTU of all Pod
interfaces and wireguard interface were reduced 130 bytes (50 for
geneve + 80 for wireguard), however, cross-cluster traffic sent from
Pods were not forwarded by wireguard interface. This is because traffic
originated from Pods will be encapsulated on gateway Node, and it's the
encapsulated packet which will be encrypted. If the wireguard interface
is set with the same MTU as the Pod interface, the encapsulated packet
will exceed wireguard interface's MTU.

Signed-off-by: Jiajing Hu <hjiajing@vmware.com>
Signed-off-by: Quan Tian <qtian@vmware.com>
Co-authored-by: Quan Tian <qtian@vmware.com>
tnqn added a commit that referenced this pull request Mar 26, 2024
The commit fixes 3 incorrect MTU configurations:

1. When using the WireGuard encryption mode, the Pod eth0's MTU was not
correct. The MTU deducted Geneve overhead because the default tunnel
type is Geneve while it should deduct the WireGuard overhead as traffic
will be encrypted instead of encapsulated.

2. When using the GRE tunnel type, the Pod eth0's MTU was not correct.
The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header
(4 standard header + 4 key field), summing up to 42 bytes.

3. When enabling Wireguard for Multicluster, the MTU of all Pod
interfaces and wireguard interface were reduced 130 bytes (50 for
geneve + 80 for wireguard), however, cross-cluster traffic sent from
Pods were not forwarded by wireguard interface. This is because traffic
originated from Pods will be encapsulated on gateway Node, and it's the
encapsulated packet which will be encrypted. If the wireguard interface
is set with the same MTU as the Pod interface, the encapsulated packet
will exceed wireguard interface's MTU.

Signed-off-by: Jiajing Hu <hjiajing@vmware.com>
Signed-off-by: Quan Tian <qtian@vmware.com>
Co-authored-by: Quan Tian <qtian@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
5 participants