Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement configurable Network Instance MTU #3991

Merged
merged 1 commit into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/APP-CONNECTIVITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,3 +322,86 @@ propagated by DHCP to connected applications, unless network instance is air-gap
(without uplink) or the uplink is app-shared (not management) and does not have a default
route of its own. In both cases, it is possible to enforce default route propagation
by configuring a static default route for the network instance.

### Network Instance MTU

The user can adjust the Maximum Transmission Unit (MTU) size of the network instance
bridge and all application interfaces connected to it.
MTU determines the largest IP packet that the network instance is allowed to carry.
A smaller MTU value is often used to avoid packet fragmentation when some form of packet
encapsulation is being applied, while a larger MTU reduces the overhead associated with
packet headers, improves network efficiency, and increases throughput by allowing more
data to be transmitted in each packet (known as a jumbo frame).

EVE uses the L3 MTU, meaning the value does not include the L2 header size (e.g., Ethernet
header or VLAN tag size). The value is a 16-bit unsigned integer, representing the MTU size
in bytes. The minimum accepted value for the MTU is 1280, which is the minimum link MTU
needed to carry an IPv6 packet (see RFC 8200, "IPv6 minimum link MTU"). If the MTU for
a network instance is not defined (zero value), EVE will set the default MTU size of 1500
bytes.

On the host side, MTU is set to bridge and app VIFs by EVE. On the guest (application)
side, the responsibility to set the MTU lies either with EVE or with the user/app,
depending on the network instance type (local or switch), app type (VM or container)
and the type of interfaces used (virtio or something else).

#### Container App VIF MTU

For container applications running inside an EVE-created shim-VM, EVE initializes the MTU
of interfaces during the shim-VM boot. MTUs of all interfaces are passed to the VM via kernel
boot arguments (/proc/cmdline). The init script parses out these values and applies them
to application interfaces (excluding direct assignments).
Furthermore, interfaces connected to local network instances will have their MTUs
automatically updated using DHCP if there is a change in the MTU configuration. To update
the MTU of interfaces connected to switch network instances, user may run an external
DHCP server in the network and publish MTU changes via DHCP option 26 (the DHCP client
run by EVE inside shim-VM will pick them up and apply them).

#### VM App VIF MTU

In the case of VM applications, it is mostly the responsibility of the app/user to set
and keep the MTUs up-to-date. When device provides HW-assisted virtualization capabilities,
EVE (with kvm or kubevirt hypervisor) connects VM with network instances using para-virtualized
virtio interfaces, which allow to propagate MTU value from the host to the guest.
If the virtio driver used by the app supports the MTU propagation, the initial MTU values
will be set using virtio (regardless of the network instance type).

To determine if virtio driver used by an app supports MTU propagation, user must check
if `VIRTIO_NET_F_MTU` feature flag is reported as `1`.
Given that:

```c
#define VIRTIO_NET_F_MTU 3
```
Check the feature flag with (replace `enp1s0` with your interface name):
```sh
# the position argument of "cat" starts with 1, hence we have to do +1
cat /sys/class/net/enp1s0/device/features | cut -c 4
1 # if not supported, prints 0 instead
```

Please note that with the Xen hypervisor, the Xen's VIF driver does not support MTU
propagation from host to guest.
To support MTU change in run-time for interfaces connected to local network instances,
VM app can run a DHCP client and receive the latest MTU via DHCP option 26.
For switch network instances, the user can run his own external DHCP server in the network
with the MTU option configured.
With Kubevirt, MTU change after VMI is deployed is not possible. This is because the bridge
and the (virtio) TAP created by Kubevirt to connect pod interface (VETH) with the VMI interface
are fully managed by Kubevirt, which lacks the ability to detect and apply MTU changes.
This means that even if the app updates MTU on its side (using e.g. DHCP), the path MTU may
differ because the connection between the VMI and the underlying Pod will continue using
the old MTU value.
#### Network Instance MTU vs. Network Adapter MTU
Please note that application traffic leaving or entering the device via a network
adapter associated with the network instance is additionally limited by the MTU value
of the adapter, configured within the NetworkConfig object. If the configured network
instance MTU differs from the network adapter MTU, EVE will flag the network instance
with an error and use the adapter's MTU for the network instance instead (to prevent
traffic from being dropped or fragmented inside EVE).
8 changes: 7 additions & 1 deletion pkg/pillar/cmd/zedmanager/handledomainmgr.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ import (
"errors"
"fmt"
"runtime"
"strconv"
"strings"

"github.com/lf-edge/eve/pkg/pillar/types"
)
Expand Down Expand Up @@ -124,10 +126,14 @@ func MaybeAddDomainConfig(ctx *zedmanagerContext,
}
if ns != nil {
adapterCount := len(ns.AppNetAdapterList)

dc.VifList = make([]types.VifConfig, adapterCount)
mtuStrList := make([]string, adapterCount)
for i, adapter := range ns.AppNetAdapterList {
dc.VifList[i] = adapter.VifInfo.VifConfig
mtuStrList[i] = strconv.Itoa(int(adapter.MTU))
}
if dc.IsOCIContainer() && adapterCount > 0 {
dc.ExtraArgs += " mtu=" + strings.Join(mtuStrList, ",")
}
}
log.Functionf("MaybeAddDomainConfig done for %s", key)
Expand Down
1 change: 1 addition & 0 deletions pkg/pillar/cmd/zedrouter/appnetwork.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ func (z *zedrouter) prepareConfigForVIFs(config types.AppNetworkConfig,
adapterStatus.Mac = z.generateAppMac(adapterNum, status, netInstStatus)
}
adapterStatus.HostName = config.Key()
adapterStatus.MTU = netInstStatus.MTU
guestIP, err := z.lookupOrAllocateIPv4ForVIF(
netInstStatus, *adapterStatus, status.UUIDandVersion.UUID)
if err != nil {
Expand Down
92 changes: 60 additions & 32 deletions pkg/pillar/cmd/zedrouter/networkinstance.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ func (z *zedrouter) getNIBridgeConfig(
MACAddress: status.BridgeMac,
IPAddress: ipAddr,
Uplink: z.getNIUplinkConfig(status),
IPConflict: status.IPConflict,
IPConflict: status.IPConflictErr.HasError(),
MTU: status.MTU,
}
}

Expand All @@ -88,6 +89,7 @@ func (z *zedrouter) getNIUplinkConfig(
LogicalLabel: port.Logicallabel,
IfName: ifName,
IsMgmt: port.IsMgmt,
MTU: port.MTU,
DNSServers: types.GetDNSServers(*z.deviceNetworkStatus, ifName),
NTPServers: types.GetNTPServers(*z.deviceNetworkStatus, ifName),
}
Expand All @@ -96,74 +98,102 @@ func (z *zedrouter) getNIUplinkConfig(
// Update NI status and set interface name of the selected uplink
// referenced by a logical label.
func (z *zedrouter) setSelectedUplink(uplinkLogicalLabel string,
status *types.NetworkInstanceStatus) (waitForUplink bool, err error) {
status *types.NetworkInstanceStatus) error {
if status.PortLogicalLabel == "" {
// Air-gapped
status.SelectedUplinkLogicalLabel = ""
status.SelectedUplinkIntfName = ""
return false, nil
return nil
}
status.SelectedUplinkLogicalLabel = uplinkLogicalLabel
if uplinkLogicalLabel == "" {
status.SelectedUplinkIntfName = ""
// This is potentially a transient state, wait for DPC update
// and uplink probing eventually finding a suitable uplink port.
return true, fmt.Errorf("no selected uplink port")
return fmt.Errorf("no selected uplink port")
}
ports := z.deviceNetworkStatus.GetPortsByLogicallabel(uplinkLogicalLabel)
switch len(ports) {
case 0:
err = fmt.Errorf("label of selected uplink (%s) does not match any port (%v)",
err := fmt.Errorf("label of selected uplink (%s) does not match any port (%v)",
uplinkLogicalLabel, ports)
// Wait for DPC update
return true, err
return err
case 1:
if ports[0].InvalidConfig {
return false, fmt.Errorf("port %s has invalid config: %s", ports[0].Logicallabel,
return fmt.Errorf("port %s has invalid config: %s", ports[0].Logicallabel,
ports[0].LastError)
}
// Selected port is OK
break
default:
err = fmt.Errorf("label of selected uplink matches multiple ports (%v)", ports)
return false, err
// Note: soon we will support NI with multiple ports.
err := fmt.Errorf("label of selected uplink matches multiple ports (%v)", ports)
return err
}
ifName := ports[0].IfName
status.SelectedUplinkIntfName = ifName
ifIndex, exists, _ := z.networkMonitor.GetInterfaceIndex(ifName)
if !exists {
// Wait for uplink interface to appear in the network stack.
return true, fmt.Errorf("missing uplink interface '%s'", ifName)
return fmt.Errorf("missing uplink interface '%s'", ifName)
}
if status.IsUsingUplinkBridge() {
_, ifMAC, _ := z.networkMonitor.GetInterfaceAddrs(ifIndex)
status.BridgeMac = ifMAC
}
return false, nil
return nil
}

// This function is called on DPC update or when UplinkProber changes uplink port
// selected for network instance.
func (z *zedrouter) doUpdateNIUplink(uplinkLogicalLabel string,
status *types.NetworkInstanceStatus, config types.NetworkInstanceConfig) {
waitForUplink, err := z.setSelectedUplink(uplinkLogicalLabel, status)
if err != nil {

// Update association between the NI and the selected device port.
uplinkErr := z.setSelectedUplink(uplinkLogicalLabel, status)
if uplinkErr == nil && status.UplinkErr.HasError() {
// Uplink issue was resolved.
status.UplinkErr.ClearError()
z.publishNetworkInstanceStatus(status)
}
if uplinkErr != nil &&
uplinkErr.Error() != status.UplinkErr.Error {
// New uplink issue arose or the error has changed.
z.log.Errorf("doUpdateNIUplink(%s) for %s failed: %v", uplinkLogicalLabel,
status.UUID, err)
status.SetErrorNow(err.Error())
status.UUID, uplinkErr)
status.UplinkErr.SetErrorNow(uplinkErr.Error())
z.publishNetworkInstanceStatus(status)
}

// Re-check MTUs between the NI and the port.
fallbackMTU, mtuErr := z.checkNetworkInstanceMTUConflicts(config, status)
if mtuErr == nil && status.MTUConflictErr.HasError() {
// MTU conflict was resolved.
status.MTUConflictErr.ClearError()
if config.MTU == 0 {
status.MTU = types.DefaultMTU
} else {
status.MTU = config.MTU
}
z.publishNetworkInstanceStatus(status)
return
}
if mtuErr != nil &&
mtuErr.Error() != status.MTUConflictErr.Error {
// New MTU conflict arose or the error has changed.
z.log.Error(mtuErr)
status.MTUConflictErr.SetErrorNow(mtuErr.Error())
status.MTU = fallbackMTU
z.publishNetworkInstanceStatus(status)
}

// Apply uplink/MTU changes in the network stack.
if status.Activated {
z.doUpdateActivatedNetworkInstance(config, status)
}
if status.WaitingForUplink && !waitForUplink {
status.WaitingForUplink = false
status.ClearError()
if config.Activate && !status.Activated {
z.doActivateNetworkInstance(config, status)
z.checkAndRecreateAppNetworks(status.UUID)
}
if config.Activate && !status.Activated && status.EligibleForActivate() {
z.doActivateNetworkInstance(config, status)
z.checkAndRecreateAppNetworks(status.UUID)
}
z.publishNetworkInstanceStatus(status)
}
Expand All @@ -175,7 +205,7 @@ func (z *zedrouter) doActivateNetworkInstance(config types.NetworkInstanceConfig
z.runCtx, config, z.getNIBridgeConfig(status))
if err != nil {
z.log.Errorf("Failed to activate network instance %s: %v", status.UUID, err)
status.SetErrorNow(err.Error())
status.ReconcileErr.SetErrorNow(err.Error())
z.publishNetworkInstanceStatus(status)
return
}
Expand Down Expand Up @@ -203,7 +233,7 @@ func (z *zedrouter) doInactivateNetworkInstance(status *types.NetworkInstanceSta
niRecStatus, err := z.niReconciler.DelNI(z.runCtx, status.UUID)
if err != nil {
z.log.Errorf("Failed to deactivate network instance %s: %v", status.UUID, err)
status.SetErrorNow(err.Error())
status.ReconcileErr.SetErrorNow(err.Error())
z.publishNetworkInstanceStatus(status)
return
}
Expand All @@ -221,7 +251,7 @@ func (z *zedrouter) doUpdateActivatedNetworkInstance(config types.NetworkInstanc
if err != nil {
z.log.Errorf("Failed to update activated network instance %s: %v",
status.UUID, err)
status.SetErrorNow(err.Error())
status.ReconcileErr.SetErrorNow(err.Error())
z.publishNetworkInstanceStatus(status)
return
}
Expand Down Expand Up @@ -314,17 +344,16 @@ func (z *zedrouter) checkAllNetworkInstanceIPConflicts() {
continue
}
conflictErr := z.checkNetworkInstanceIPConflicts(niConfig)
if conflictErr == nil && niStatus.IPConflict {
if conflictErr == nil && niStatus.IPConflictErr.HasError() {
// IP conflict was resolved.
niStatus.IPConflictErr.ClearError()
if niStatus.Activated {
// Local NI was initially activated prior to the IP conflict.
// Subsequently, when the IP conflict arose, it was almost completely
// un-configured (only preserving app VIFs) to keep device connectivity
// unaffected. Now, it can be restored to full functionality.
z.log.Noticef("Updating NI %s (%s) now that IP conflict "+
"is not present anymore", niConfig.UUID, niConfig.DisplayName)
niStatus.IPConflict = false
niStatus.ClearError()
// This also publishes the new status.
z.doUpdateActivatedNetworkInstance(*niConfig, &niStatus)
} else {
Expand All @@ -338,11 +367,10 @@ func (z *zedrouter) checkAllNetworkInstanceIPConflicts() {
z.handleNetworkInstanceCreate(nil, niConfig.Key(), *niConfig)
}
}
if conflictErr != nil && !niStatus.IPConflict {
if conflictErr != nil && !niStatus.IPConflictErr.HasError() {
// New IP conflict arose.
z.log.Error(conflictErr)
niStatus.IPConflict = true
niStatus.SetErrorNow(conflictErr.Error())
niStatus.IPConflictErr.SetErrorNow(conflictErr.Error())
z.publishNetworkInstanceStatus(&niStatus)
if niStatus.Activated {
// Local NI is already activated. Instead of removing it and halting
Expand Down
Loading
Loading