Skip to content
This repository has been archived by the owner on Dec 20, 2022. It is now read-only.

Flow Control and QoS configuration for RoCE fabrics

Yuval Degani edited this page Oct 4, 2018 · 1 revision

InfiniBand networks are inherently lossless. They incorporate a link-level flow control to ensure that packets are not dropped within the fabric. RoCE (RDMA over Converged Ethernet) implements the InfiniBand protocol over a standard Ethernet/IP network, which can be lossy. Due to the performance implications of a lossy network when running RoCE, it is recommended to enable some form of flow control within your fabric.

For detailed information, please refer to Network Considerations for Global Pause, PFC and QoS with Mellanox Switches and Adapters document.

Flow Control (Global Pause)

"Global Pause" is the simplest mode of flow control for achieving a lossless Ethernet fabric.

Background:

The Ethernet standard (802.3) is unreliable (or "lossy") by design. In its primitive form, there is no guarantee for packets to reach the required destinations. The Ethernet standard gives this responsibility to the upper layer protocols (e.g. TCP).

Later on, the IEEE 802.3x (Annex 31B of 802.3) flow control standard was defined for applications that cannot build reliability on the upper layers protocols. It enables receiving buffer feedback (e.g. overflow) from a receiver to its sender.

The pause action (XOFF) is a control frame sent by the receiver to alert the sender that the receiver buffer is stressed and is about to overflow. The sender responds by stopping the transmission of any new packets until the receiver is ready to accept them again. The pause frame contains a timeout value. The sender will wait during this timeout or until an XON control message is received.

Configuring global pause on network adapters:

    ethtool -a <interface name>

    $ ethtool -a ens2
    Pause parameters for ens2:
    Autonegotiate:      off
    RX:         off
    TX:         off

If the RX and TX settings are turned off, then they should be enabled:

    $ ethtool -A ens2 rx on tx on

    $ ethtool -a ens2
    Pause parameters for ens2:
    Autonegotiate:      off
    RX:         on
    TX:         on

After enabling flow control on the adapters, the switch(es) must be configured accordingly. When using Mellanox switches you can run the following commands on individual ports, or provide a range:

mellanox-switch [standalone: master] (config) # interface ethernet 1/1-1/32 flowcontrol receive on force
mellanox-switch [standalone: master] (config) # interface ethernet 1/1-1/32 flowcontrol send on force

If you are using a switch from another vendor, then you will need to refer to their documentation for enabling global pause (IEEE 802.3x port based flow control).

Advanced forms of flow control

In more complex environments, it may be required to employ some more advanced configurations and consider other flow control options in order to achieve maximum performance.

Please refer to the following document "Recommended Network Configuration Examples for RoCE Deployment" for detailed configuration recipes to match various production use cases.