Skip to content

A VMA Basic Sockperf test Examples

NirNitzani edited this page Oct 26, 2017 · 2 revisions

VMA performance can be validated using Sockperf in number of working modes. The main use cases are Ping-Pong and Underload.
Ping-Pong test: a packet (Ping) with specified message size is sent to the server side and sent back (Pong) to the client. the time it takes from sending the packet until getting it back divided by two is the result from this test.
Underload test: Measure the latency of a single packets under a load of millions Packets Per Second (without waiting for reply of packet before sending subsequent packet on time)

More on Sockperf can be found in here: Sockperf WiKi Best results can be achieved by running latest VMA version, Tuned machine and using the right VMA_SPEC - more on that can be found in here: VMA Performance Tuning Guide

Notes:

  • The NUMA been used is the closest to the NIC
  • The cores are the most optimized on this machine

Ping-Pong TCP 14Bytes

  • First machine (Server side):
    $ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 19,13 sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp

  • Second machine run (Client side):
    $ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp

Underload TCP 64Bytes

  • First machine (Server side):
    `$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 17,21,19 sockperf sr --msg-size 64 --ip 5.5.1.1 --port 19142 --tcp'

  • Second machine run (Client side):
    $ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 17,21,19 sockperf ul --time 4 --msg-size 64 --ip 5.5.1.1 --port 19142 --tcp

Example of Sockperf output using VMA (Ping-Pong TCP 14B)

VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec                           Latency                    [VMA_SPEC]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
VMA INFO: Tx QP WRE                      256                        [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching             4                          [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE                      256                        [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching             4                          [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops                  -1                         [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll  256                        [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams                0                          [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec)             -1                         [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force           Enabled                    [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio           1                          [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS                 1                          [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec)       100                        [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation       Disabled                   [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count               128                        [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation         Disabled                   [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full               Disabled                   [VMA_CQ_KEEP_QP_FULL]
VMA INFO: Avoid sys-calls on tcp fd      Enabled                    [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity       0                          [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode                    Single                     [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type              2 (Huge Pages)             [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: taskset -c 19,13 sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec                           Latency                    [VMA_SPEC]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
VMA INFO: Tx QP WRE                      256                        [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching             4                          [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE                      256                        [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching             4                          [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops                  -1                         [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll  256                        [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams                0                          [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec)             -1                         [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force           Enabled                    [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio           1                          [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS                 1                          [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec)       100                        [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation       Disabled                   [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count               128                        [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation         Disabled                   [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full               Disabled                   [VMA_CQ_KEEP_QP_FULL]
VMA INFO: Avoid sys-calls on tcp fd      Enabled                    [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity       0                          [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode                    Single                     [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type              2 (Huge Pages)             [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec                           Latency                    [VMA_SPEC]
sockperf: == version #2.8-0.git3dd5971d7d7a ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)  
[ 0] IP = 11.4.3.3        PORT = 19140 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=4.100 sec; SentMessages=1492229; ReceivedMessages=1492228
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=4.000 sec; SentMessages=1455879; ReceivedMessages=1455879
sockperf: ====> avg-lat=  1.359 (std-dev=0.031)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 1.359 usec
sockperf: Total 1455879 observations; each percentile contains 14558.79 observations
sockperf: ---> <MAX> observation =    6.271
sockperf: ---> percentile 99.999 =    2.085
sockperf: ---> percentile 99.990 =    1.569
sockperf: ---> percentile 99.900 =    1.463
sockperf: ---> percentile 99.000 =    1.428
sockperf: ---> percentile 90.000 =    1.396
sockperf: ---> percentile 75.000 =    1.378
sockperf: ---> percentile 50.000 =    1.359
sockperf: ---> percentile 25.000 =    1.338
sockperf: ---> <MIN> observation =    1.253

The Most interesting lines in the output

  • VMA and OFED version - confirm that you are using the correct version and VMA is active
  • VMA SPEC - for measuring latency it's mostly needed
  • Average latency: Summary: Latency is 1.359 usec
  • Maximum latency: observation = 6.271