librdmacm/cmtime: Rework of cmtime example #1448

shefty · 2024-04-08T21:01:34Z

There's a need to analyze and improve the CM flow. To support that, significantly update the cmtime test to provide additional data and make it more suitable to be extended.

Changes include: Code cleanups and restructuring. Reduce test overhead. Replace rdma_cm management of QPs with direct QP manipulation, to get better timings of the cost of creating and modifying QPs. Expand timing to include other steps. Provide additional data as output. Switch to improved time stamp calls. Enable execution of CM paths without interacting with HW QPs.

New output looks like this:

Client warmup
        Creating IDs
        Binding addresses
        Resolving addresses
        Resolving routes
        Creating QPs
        Allocating verbs resources
        Modify QPs to INIT
        Connecting
        Disconnecting
        Destroying QPs
        Destroying IDs
Connect (1000) QPs test
        Creating IDs
        Binding addresses
        Resolving addresses
        Resolving routes
        Creating QPs
        Modify QPs to INIT
        Connecting
        Disconnecting
        Destroying QPs
        Destroying IDs
step              us/conn    sum(us)    max(us)    min(us)  total(us)   avg/iter
full connect :          0          0     888620     449073     889795        889
create id    :          1       1111          9          0       1173          1
bind addr    :          1       1451         73          1       1487          1
resolve addr :         15      15068         42          5       1537          1
resolve route:      78510   78510725     204775         97     206685        206
create qp    :        118     118524        251        111     118595        118
init qp attr :          0        550          2          0          0          0
init qp      :        117     117412        218        114     118060        118
rtr qp attr  :          0        611          1          0          0          0
rtr qp       :        122     122798        397        114          0          0
rts qp attr  :          0        510          1          0          0          0
rts qp       :         80      80296        108         73          0          0
cm connect   :     221221  221221333     438878       1508     442226        442
establish    :          1       1661          8          1          0          0
disconnect   :      81566   81566044    6392013       3400    6393192       6393
destroy id   :          1       1552         89          1       1588          1
destroy qp   :        588     588545       1084        554     588582        588
Connect (1000) test - no QPs
        Creating IDs
        Binding addresses
        Resolving addresses
        Resolving routes
        Creating QPs
        Modify QPs to INIT
        Connecting
        Disconnecting
        Destroying QPs
        Destroying IDs
step              us/conn    sum(us)    max(us)    min(us)  total(us)   avg/iter
full connect :          0          0     221590     214196     222545        222
create id    :          0        898          4          0        951          0
bind addr    :          1       1431        103          1       1459          1
resolve addr :         10      10818         25          3       1839          1
resolve route:     101123  101123659     204358       3613     205703        205
create qp    :          0         42          1          0         87          0
init qp attr :          0        471          2          0          0          0
init qp      :          0         33          1          0        559          0
rtr qp attr  :          0        500          2          0          0          0
rtr qp       :          0         28          1          0          0          0
rts qp attr  :          0        370          1          0          0          0
rts qp       :          0         34          7          0          0          0
cm connect   :       5936    5936575       8439       3152      11911         11
establish    :          2       2475         30          1          0          0
disconnect   :       8251    8251319       9008       5193      10659         10
destroy id   :          1       1537         65          1       1564          1
destroy qp   :          0         28          1          0         72          0

Put while loop code on separate line to avoid hiding it. Signed-off-by: Sean Hefty <shefty@nvidia.com>

The cleanup_nodes() and show_perf() functions walk the nodes array; however, only the client allocates the array. The structure of the code suggests that the server will also walk the nodes array. The server does not, but only because it enters an infinite loop and doesn't reach those calls, otherwise it would crash. Restructure the calls to make it obvious that only the client uses the nodes array. Separate the node allocation from initialization to align with how destruction must be handled. Signed-off-by: Sean Hefty <shefty@nvidia.com>

rdma_freeaddrinfo() accepts a null parameter. Remove duplicate null checks prior to calling rdma_freeaddrinfo(). Signed-off-by: Sean Hefty <shefty@nvidia.com>

Remove duplicate error messages printed after get_rdma_addr() into the function. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Move the get_rdma_addr() calls, which allocate rdma_addrinfo, into main(), so that it can pair with the rdma_freeaddrinfo() call. This restructure makes the initialization and cleanup code easier to follow. Signed-off-by: Sean Hefty <shefty@nvidia.com>

This function just creates an event channel and prints a couple of error messages on failure. It has no concept of being the 'first' event channel, except by the caller's use. Replace direct calls to rdma_create_event_channel() in the example code with this call to make use of the existing error messages. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Limit the number of connections that the server will process to the input connection count. This will allow the server to pre- allocate all necessary structures during initialization, removing malloc calls from the connection handling path. It will also allow the server to time events on its side of the connection. As part of this update, threads spawned to handle CM events will exit after processing the correct number of events. Once all CM events have been handled, any thread waiting to process more events will also exit. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Improve code readability. Remove unneeded check for client in client only executed code. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Simplify and generalize the work queue abstraction. Add helper to initialize a work_queue. Add thread tracking to the work queue, with a common work item callback handler. These changes merge most of the CM request and disconnect event handling into a common work queue abstraction. Further simplify the work queue by replacing the double-linked list with a single-linked list implementation to reduce overhead. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Replace dynamic memory allocations during connection setup with a pre-allocated node array, similar to the client's behavior. This reduces per connection overhead, plus will allow the client and server to share more code in subsequent patches. All rdma_cm_id's will have their context set to reference a node. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Time the destruction of disconnect, destroy ID, and destroy QPs separately. This change adjusts the cleanup on the server side, so that it can be timed as well. Signed-off-by: Sean Hefty <shefty@nvidia.com>

A follow on patch will introduce a connection warmup flow. Restructure the server operation to start the listen separately from establishing connections. Signed-off-by: Sean Hefty <shefty@nvidia.com>

The warmup iteration will be used to allocate verb objects prior to running the timed tests. The warmup will go through the same client/server paths, which requires reseting several variables used to track the test state. Signed-off-by: Sean Hefty <shefty@nvidia.com>

In order to time QP operations separate from CM calls, create and modify QPs by calling verbs directly, versus through the rdma cm APIs. This also allows the test to reuse verbs objects, such as the PD and CQ, which are created with the first QP during test warmup. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Improve the readability of the output. Include reporting the average of per node timing. The test takes time stamps for each step of the connection process for every connection being established. Every connection that is established is tracked as a 'node'. Calculate and report the average of per node timings. This is useful on the server where it cannot time iterating over a loop of operations. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Use thread-safe high-resolution timer for timestamps. This updates simplifies timers from a timeval struct to an integer. Signed-off-by: Sean Hefty <shefty@nvidia.com>

In addition to reporting times for individual steps of the connect process, time and report the time to establish the full connection, from start to finish. Signed-off-by: Sean Hefty <shefty@nvidia.com>

This will allow testing the CM protocol and handling without HW delays introduced by allocating and modifying QPs. Signed-off-by: Sean Hefty <shefty@nvidia.com>

Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Fix formatting around while loops

92b8abb

Put while loop code on separate line to avoid hiding it. Signed-off-by: Sean Hefty <shefty@nvidia.com>

shefty force-pushed the cmtime branch 2 times, most recently from e1b05f4 to 39c9ac2 Compare April 8, 2024 23:37

shefty added 18 commits April 8, 2024 17:09

librdmacm/examples: Remove unnecessary checks before rdma_freeaddrinfo()

da498e6

rdma_freeaddrinfo() accepts a null parameter. Remove duplicate null checks prior to calling rdma_freeaddrinfo(). Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/examples: Move error message for rdma_getaddrinfo to helper

b84e8ff

Remove duplicate error messages printed after get_rdma_addr() into the function. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Add helper function to identify client

2a46ec8

Improve code readability. Remove unneeded check for client in client only executed code. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Separate cleanup of QPs and IDs

ef14aa4

Time the destruction of disconnect, destroy ID, and destroy QPs separately. This change adjusts the cleanup on the server side, so that it can be timed as well. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Separate server listen and connect handling

9f3a6b4

A follow on patch will introduce a connection warmup flow. Restructure the server operation to start the listen separately from establishing connections. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Use clock_gettime instead of gettimeofday

d05f337

Use thread-safe high-resolution timer for timestamps. This updates simplifies timers from a timeval struct to an integer. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Report time for entire connect process

d9385b5

In addition to reporting times for individual steps of the connect process, time and report the time to establish the full connection, from start to finish. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Add test that skips QP operations

2dfdf32

This will allow testing the CM protocol and handling without HW delays introduced by allocating and modifying QPs. Signed-off-by: Sean Hefty <shefty@nvidia.com>

librdmacm/cmtime: Update man page to reflect latest updates

212da56

Signed-off-by: Sean Hefty <shefty@nvidia.com>

shefty force-pushed the cmtime branch 4 times, most recently from f90176a to 212da56 Compare April 12, 2024 18:55

rleon merged commit 3be5661 into linux-rdma:master Apr 15, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

librdmacm/cmtime: Rework of cmtime example #1448

librdmacm/cmtime: Rework of cmtime example #1448

shefty commented Apr 8, 2024

librdmacm/cmtime: Rework of cmtime example #1448

librdmacm/cmtime: Rework of cmtime example #1448

Conversation

shefty commented Apr 8, 2024