-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librdmacm/cmtime: Rework of cmtime example #1448
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Put while loop code on separate line to avoid hiding it. Signed-off-by: Sean Hefty <shefty@nvidia.com>
shefty
force-pushed
the
cmtime
branch
2 times, most recently
from
April 8, 2024 23:37
e1b05f4
to
39c9ac2
Compare
The cleanup_nodes() and show_perf() functions walk the nodes array; however, only the client allocates the array. The structure of the code suggests that the server will also walk the nodes array. The server does not, but only because it enters an infinite loop and doesn't reach those calls, otherwise it would crash. Restructure the calls to make it obvious that only the client uses the nodes array. Separate the node allocation from initialization to align with how destruction must be handled. Signed-off-by: Sean Hefty <shefty@nvidia.com>
rdma_freeaddrinfo() accepts a null parameter. Remove duplicate null checks prior to calling rdma_freeaddrinfo(). Signed-off-by: Sean Hefty <shefty@nvidia.com>
Remove duplicate error messages printed after get_rdma_addr() into the function. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Move the get_rdma_addr() calls, which allocate rdma_addrinfo, into main(), so that it can pair with the rdma_freeaddrinfo() call. This restructure makes the initialization and cleanup code easier to follow. Signed-off-by: Sean Hefty <shefty@nvidia.com>
This function just creates an event channel and prints a couple of error messages on failure. It has no concept of being the 'first' event channel, except by the caller's use. Replace direct calls to rdma_create_event_channel() in the example code with this call to make use of the existing error messages. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Limit the number of connections that the server will process to the input connection count. This will allow the server to pre- allocate all necessary structures during initialization, removing malloc calls from the connection handling path. It will also allow the server to time events on its side of the connection. As part of this update, threads spawned to handle CM events will exit after processing the correct number of events. Once all CM events have been handled, any thread waiting to process more events will also exit. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Improve code readability. Remove unneeded check for client in client only executed code. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Simplify and generalize the work queue abstraction. Add helper to initialize a work_queue. Add thread tracking to the work queue, with a common work item callback handler. These changes merge most of the CM request and disconnect event handling into a common work queue abstraction. Further simplify the work queue by replacing the double-linked list with a single-linked list implementation to reduce overhead. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Replace dynamic memory allocations during connection setup with a pre-allocated node array, similar to the client's behavior. This reduces per connection overhead, plus will allow the client and server to share more code in subsequent patches. All rdma_cm_id's will have their context set to reference a node. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Time the destruction of disconnect, destroy ID, and destroy QPs separately. This change adjusts the cleanup on the server side, so that it can be timed as well. Signed-off-by: Sean Hefty <shefty@nvidia.com>
A follow on patch will introduce a connection warmup flow. Restructure the server operation to start the listen separately from establishing connections. Signed-off-by: Sean Hefty <shefty@nvidia.com>
The warmup iteration will be used to allocate verb objects prior to running the timed tests. The warmup will go through the same client/server paths, which requires reseting several variables used to track the test state. Signed-off-by: Sean Hefty <shefty@nvidia.com>
In order to time QP operations separate from CM calls, create and modify QPs by calling verbs directly, versus through the rdma cm APIs. This also allows the test to reuse verbs objects, such as the PD and CQ, which are created with the first QP during test warmup. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Improve the readability of the output. Include reporting the average of per node timing. The test takes time stamps for each step of the connection process for every connection being established. Every connection that is established is tracked as a 'node'. Calculate and report the average of per node timings. This is useful on the server where it cannot time iterating over a loop of operations. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Use thread-safe high-resolution timer for timestamps. This updates simplifies timers from a timeval struct to an integer. Signed-off-by: Sean Hefty <shefty@nvidia.com>
In addition to reporting times for individual steps of the connect process, time and report the time to establish the full connection, from start to finish. Signed-off-by: Sean Hefty <shefty@nvidia.com>
This will allow testing the CM protocol and handling without HW delays introduced by allocating and modifying QPs. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Sean Hefty <shefty@nvidia.com>
shefty
force-pushed
the
cmtime
branch
4 times, most recently
from
April 12, 2024 18:55
f90176a
to
212da56
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There's a need to analyze and improve the CM flow. To support that, significantly update the cmtime test to provide additional data and make it more suitable to be extended.
Changes include: Code cleanups and restructuring. Reduce test overhead. Replace rdma_cm management of QPs with direct QP manipulation, to get better timings of the cost of creating and modifying QPs. Expand timing to include other steps. Provide additional data as output. Switch to improved time stamp calls. Enable execution of CM paths without interacting with HW QPs.
New output looks like this: