Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement configurable Network Instance MTU #3991

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

milan-zededa
Copy link
Contributor

User is now able to set MTU for network instance bridge and all application interfaces connected to it.

MTU determines the largest IP packet that the network instance is allowed to carry. This does not include the L2 header size (e.g. Ethernet header or a VLAN tag size). The value is a 16-byte unsigned integer, representing the MTU size in bytes. The minimum accepted value for the MTU is 1280 (RFC 8200, "IPv6 minimum link MTU"). If not defined (zero value), EVE will set the MTU to the default value of 1500 bytes.

On the host side, MTU is set to interfaces by EVE. On the guest (app) side, the responsibility to set the MTU lies either with EVE or with the user/app, depending on the network instance, app type and the type of interfaces used (local or switch, VM or container, virtio or something else).

For container applications running inside an EVE-created shim-VM, EVE initializes the MTU of interfaces during shim-VM boot. MTUs of all interfaces are passed to the VM via kernel boot arguments (/proc/cmdline). The init script parses out these values and applies them to application interfaces (excluding direct assignments).
Furthermore, interfaces connected to local network instances will have their MTUs automatically updated using DHCP if there is a change in MTU configuration. To update the MTU of interfaces connected to switch network instances, user may run an external DHCP server in the network and publish MTU changes via DHCP option 26 (the DHCP client run by EVE inside shim-VM will pick it up and apply it).

In the case of VM applications, it is mostly the responsibility of the app/user to set and keep the MTUs up-to-date.
When device provides HW-assisted virtualization capabilities, EVE (with kvm or kubevirt hypervisor) connects VM with network instances using para-virtualized virtio interfaces, which allow to propagate MTU value from the host to the guest. If the virtio driver used by the app supports the MTU propagation (VIRTIO_NET_F_MTU feature flag is set), the initial MTU values will be set using virtio (regardless of the network instance type).

To support MTU update for interfaces connected to local network instances, the app can run a DHCP client and receive the latest MTU via DHCP option 26. For switch network instances, the user can run his own external DHCP server in the network with the MTU option configured.

For other hypervisors, DHCP-based MTU propagation is also available but other options are limited:

  • xen's VIF driver does not support MTU propagation from host to guest
  • with kubernetes, the MTU value (initially) set on the VETH connecting pod with a network instance is propagated further into the VM by the kubevirt+virtio. However, kubevirt lacks the capability to detect MTU changes and propagate them to the VM.

Please note that application traffic leaving or entering the device via a network adapter associated with the network instance is additionally limited by the MTU value of the adapter, configured within the NetworkConfig object. If the configured network instance MTU differs from the network adapter MTU, EVE will flag the network instance with an error and use the adapter's MTU for the network instance instead (to prevent traffic from being dropped or fragmented inside EVE).

Significant part of this commit is also refactoring of Network instance error management. There are different kinds of errors that NI can be flagged with. Some of those errors are critical and prevent NI from being created, while others can be ignored to some extent or might be transient. It is difficult to manage all these possible error scenarios with only one error attribute in NetworkInstanceStatus. Therefore, I have split the error field into multiple attributes, one for each kind of error. This significantly simplifies the error management while adding only few new fields into the structure.

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 17.51%. Comparing base (2c5fb18) to head (18b9bad).
Report is 56 commits behind head on master.

Current head 18b9bad differs from pull request most recent head 9ac2b93

Please upload reports for the commit 9ac2b93 to get more accurate results.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3991   +/-   ##
=======================================
  Coverage   17.51%   17.51%           
=======================================
  Files           3        3           
  Lines         805      805           
=======================================
  Hits          141      141           
  Misses        629      629           
  Partials       35       35           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but two questions to answer

@milan-zededa
Copy link
Contributor Author

but two questions to answer

I have submitted the answers - are they not visible? Or maybe you have submitted two more questions which I cannot see (maybe sent from a phone?).

@milan-zededa
Copy link
Contributor Author

This failed in all smoke tests:

        FAIL: ../eclient/testdata/metadata.txt:26: command failure

Will investigate once artifacts are available...

User is now able to set MTU for network instance bridge and all
application interfaces connected to it.

MTU determines the largest IP packet that the network
instance is allowed to carry. This does not include the L2 header size
(e.g. Ethernet header or a VLAN tag size). The value is a 16-byte
unsigned integer, representing the MTU size in bytes. The minimum
accepted value for the MTU is 1280 (RFC 8200, "IPv6 minimum link MTU").
If not defined (zero value), EVE will set the MTU to the default value
of 1500 bytes.

On the host side, MTU is set to interfaces by EVE. On the guest (app)
side, the responsibility to set the MTU lies either with EVE or with
the user/app, depending on the network instance, app type and the type of
interfaces used (local or switch, VM or container, virtio or something
else).

For container applications running inside an EVE-created shim-VM, EVE
initializes the MTU of interfaces during shim-VM boot. MTUs of all
interfaces are passed to the VM via kernel boot arguments (/proc/cmdline).
The init script parses out these values and applies them to application
interfaces (excluding direct assignments).
Furthermore, interfaces connected to local network instances will have
their MTUs automatically updated using DHCP if there is a change in MTU
configuration. To update the MTU of interfaces connected to switch
network instances, user may run an external DHCP server in the network
and publish MTU changes via DHCP option 26 (the DHCP client run by EVE
inside shim-VM will pick it up and apply it).

In the case of VM applications, it is mostly the responsibility of the
app/user to set and keep the MTUs up-to-date.
When device provides HW-assisted virtualization capabilities, EVE (with
kvm hypervisor) connects VM with network instances using para-virtualized
virtio interfaces, which allow to propagate MTU value from the host to
the guest. If the virtio driver used by the app supports the MTU
propagation (VIRTIO_NET_F_MTU feature flag is set), the initial MTU
values will be set using virtio (regardless of the network instance type).

To support MTU update for interfaces connected to local network instances,
the app can run a DHCP client and receive the latest MTU via DHCP option 26.
For switch network instances, the user can run his own external DHCP server
in the network with the MTU option configured.

For other hypervisors, DHCP-based MTU propagation is also available but
other options are limited:
- xen's VIF driver does not support MTU propagation from host to guest
- with kubernetes, the MTU value (initially) set on the VETH connecting
  pod with a network instance is propagated further into the VM by the
  kubevirt. However, kubevirt lacks the capability to detect MTU changes
  and propagate them to the VM.

Please note that application traffic leaving or entering the device
via a network adapter associated with the network instance is additionally
limited by the MTU value of the adapter, configured within the NetworkConfig
object. If the configured network instance MTU differs from the network
adapter MTU, EVE will flag the network instance with an error and use the
adapter's MTU for the network instance instead (to prevent traffic from
being dropped or fragmented inside EVE).

Significant part of this commit is also refactoring of Network instance
error management. There are different kinds of errors that NI can be
flagged with. Some of those errors are critical and prevent NI from being
created, while others can be ignored to some extent or might be transient.
It is difficult to manage all these possible error scenarious with only
one error attribute in NetworkInstanceStatus. Therefore, I have split the
error field into multiple attributes, one for each kind of error. This
significantly simplifies the error management while adding only few new
fields into the structure.

Signed-off-by: Milan Lenco <milan@zededa.com>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark eriknordmark merged commit 1b13ae7 into lf-edge:master Jun 25, 2024
27 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants