-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librdmacm/cmtime: Add multi-thread support #1451
Merged
Merged
+767
−388
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a command line option that suspends the executing thread for a given number of microseconds instead of calling a QP operation. This provides a rough simulation of the HW interactions to create and modify a QP without allocating the HW resources. The purpose of this change is to allow testing the CM behavior at scales that are impractical or impossible given system constraints. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Restructure the code to move client and server code from main into isolated function. A subsequent patch will add more operations to both. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Add a simple socket connection to sync the client and server during testing. When scaling up tests, CM issues are exposed in ways that are unexpected. :) Yes, the problems are being exposed, but earlier and in more difficult ways to analyze. Example: A client finishes connecting by sending an RTU message. It then immediately starts disconnecting. This sends DREQs. The problem is that the number of messages seen by the server is now twice as many as the number of connections. The result is that DREQs get dropped, which results in retries, and the client and server move out of sync. So, although the test shows scaling for connecting, it starts to break when disconnecting. The actual synchronization during the test is added as a separate change. This patch just lays the foundation. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Replace the started != completed syntax with a simpler check that completed != iterations to perform. This removes the started variable. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Instead of using a hidden global variable that's set to the iteration parameter, use the iter parameter directly and pass it through to sub-functions. This makes the code easier to read. Note that the global connections variable is still needed after this patch in order to track asynchronous events that need to be processed. However, its use is reduced. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Update the server, so that it processes events using a separate thread, similar to the client. This allows the main thread to continue as a control thread, which is then used to synchronize the testing with the client. By synchronizing the the tests, the server can now know what events it expects to receive. This enables the server to track the start and end of the connect and disconnect phases. Prior to adding synchronization, disconnect events could have been generated before all connections completed. (For example. the RTU from the client was lost, requiring that it be retried. During that window, the client could start the disconnect process. This made the number of events that the server would actually see unknown in the case packets were lost.) Update the code to track the start and end of the phases. That data feeds into the control thread determining when to sync. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Isolate the work queue abstraction from the cmtime. Define a work_item structure to allow individual work items to invoke separate callbacks. Add support for multiple threads. Move the code from the cmtime source into the common source for future reusability. This change will enable cmtime to use a single work queue to process different steps in the connection process. Having a single work queue will limit the number of threads that the test will need to spawn. Signed-off-by: Sean Hefty <shefty@nvidia.com>
To simplify reading the code, move the trivial event handler functions into the main event handler call. This helps make it clear that the event thread should just process the event directly, rather than queuing to a worker thread. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Have the client use a work queue to multi-thread the QP creation, address resolution, route resolution, and QP modify operations. Signed-off-by: Sean Hefty <shefty@nvidia.com>
The test tries to check for connection errors and continue if one occurs. However, the client and server must be in synch regarding the number of connections. Plus, since the goal of the test is provide accurate timings of CM steps, if an error occurs, the timings are thrown off. Finally, the error handling is insufficient for what it attempts to do. Just exit the test on error after printing a message, and avoid giving a reader the idea that errors are handled properly. Signed-off-by: Sean Hefty <shefty@nvidia.com>
After the client sends a disconnect request to the server, have it wait for the server to sync using the OOB mechanism. As the number of connections to test gets close to 1000, it's frequent that the DREP is not making it back to the DREQ. The result is that the DREQ must time out completely before the client can proceed. Note that this appears to be exposing undesirable behavior from the kernel CM regarding duplicate DREQ handling. However, because the timeouts are so long, it impacts the ability to execute the test to collect connection setup timings. Signed-off-by: Sean Hefty <shefty@nvidia.com>
A subsequent patch will enable multiple threads to update the completed counters in parallel. Switch to using atomics. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Once multi-thread support is added to cmtime, the use_qpn and disc_events variables will be access by multiple threads. Use an atomic variable instead of volatiles. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Add a command line argument to set the number of threads assigned to a work queue. Signed-off-by: Sean Hefty <shefty@nvidia.com>
When multiple work items are added to the work queue, it's possible that only a single thread is signaled to process the work. The result is that the other threads continue to wait on the condition. Since the test inserts work using a loop, this occurs frequently, and the behavior appears as a singled thread app, even if multiple threads were requested. After removing a work item, if other work items remain, signal the condition to wake up another thread. Signed-off-by: Sean Hefty <shefty@nvidia.com>
Remove : to make it easier to import the data into excel. Move the avg/iter forward, since it's the more useful value to examine. Include headers with some test details. Signed-off-by: Sean Hefty <shefty@nvidia.com>
To provide a reasonable baseline, add an option to connect over tcp sockets instead of RDMA. (The author was unable to find a suitable connection rate test for sockets. However, since this is just to provide something to compare against, the code is fairly simple.) Signed-off-by: Sean Hefty <shefty@nvidia.com>
Added a patch that tests tcp socket connection rate, which can be used as a baseline. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The primary objective of this series is to add multi-thread support to the cmtime app. Multi-threading is added by abstracting and generalizing the work queue abstraction used by cmtime. (If there's a usable version of a work queue suitable for rdma-core use, I can switch to that if needed.) The series also adds:
The final version of cmtime could use a review to verify that it is capturing the data as intended.