Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librdmacm/cmtime: Add multi-thread support #1451

Merged
merged 17 commits into from
May 8, 2024
Merged

Conversation

shefty
Copy link
Contributor

@shefty shefty commented Apr 23, 2024

The primary objective of this series is to add multi-thread support to the cmtime app. Multi-threading is added by abstracting and generalizing the work queue abstraction used by cmtime. (If there's a usable version of a work queue suitable for rdma-core use, I can switch to that if needed.) The series also adds:

  • Ability to 'mimic' QP operations, by replacing QP calls with sleeps. This allows testing at scales where QP resources may not be available.
  • Add out-of-band synchronization mechanism to enable more precise timing of different testing steps.
  • Remove incomplete, complex, and unnecessary error handling. Just abort test on unexpected errors.

The final version of cmtime could use a review to verify that it is capturing the data as intended.

shefty added 16 commits April 12, 2024 12:03
Add a command line option that suspends the executing
thread for a given number of microseconds instead of calling
a QP operation.  This provides a rough simulation of the HW
interactions to create and modify a QP without allocating
the HW resources.  The purpose of this change is to allow
testing the CM behavior at scales that are impractical or
impossible given system constraints.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Restructure the code to move client and server code from
main into isolated function.  A subsequent patch will add
more operations to both.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Add a simple socket connection to sync the client and server
during testing.

When scaling up tests, CM issues are exposed in ways that are
unexpected.  :)  Yes, the problems are being exposed, but
earlier and in more difficult ways to analyze.

Example: A client finishes connecting by sending an RTU
message.  It then immediately starts disconnecting.  This sends
DREQs.  The problem is that the number of messages seen
by the server is now twice as many as the number of
connections.  The result is that DREQs get dropped, which
results in retries, and the client and server move out of sync.
So, although the test shows scaling for connecting, it starts
to break when disconnecting.

The actual synchronization during the test is added as a separate
change.  This patch just lays the foundation.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Replace the started != completed syntax with a simpler check that
completed != iterations to perform.  This removes the started
variable.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Instead of using a hidden global variable that's set to the
iteration parameter, use the iter parameter directly and pass
it through to sub-functions.  This makes the code easier to
read.

Note that the global connections variable is still needed after
this patch in order to track asynchronous events that need to
be processed.  However, its use is reduced.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Update the server, so that it processes events using a
separate thread, similar to the client.  This allows the main
thread to continue as a control thread, which is then used
to synchronize the testing with the client.

By synchronizing the the tests, the server can now know what
events it expects to receive.  This enables the server to
track the start and end of the connect and disconnect phases.
Prior to adding synchronization, disconnect events could have
been generated before all connections completed.  (For example.
the RTU from the client was lost, requiring that it be retried.
During that window, the client could start the disconnect
process.  This made the number of events that the server would
actually see unknown in the case packets were lost.)

Update the code to track the start and end of the phases.  That
data feeds into the control thread determining when to sync.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Isolate the work queue abstraction from the cmtime.  Define
a work_item structure to allow individual work items to
invoke separate callbacks.  Add support for multiple threads.
Move the code from the cmtime source into the common source
for future reusability.

This change will enable cmtime to use a single work queue
to process different steps in the connection process.  Having
a single work queue will limit the number of threads that
the test will need to spawn.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
To simplify reading the code, move the trivial event handler
functions into the main event handler call.  This helps
make it clear that the event thread should just process the
event directly, rather than queuing to a worker thread.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Have the client use a work queue to multi-thread the QP
creation, address resolution, route resolution, and QP
modify operations.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
The test tries to check for connection errors and continue if
one occurs.  However, the client and server must be in synch
regarding the number of connections.  Plus, since the goal of
the test is provide accurate timings of CM steps, if an error
occurs, the timings are thrown off.  Finally, the error handling
is insufficient for what it attempts to do.

Just exit the test on error after printing a message, and avoid
giving a reader the idea that errors are handled properly.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
After the client sends a disconnect request to the server,
have it wait for the server to sync using the OOB mechanism.
As the number of connections to test gets close to 1000, it's
frequent that the DREP is not making it back to the DREQ.
The result is that the DREQ must time out completely before
the client can proceed.

Note that this appears to be exposing undesirable behavior
from the kernel CM regarding duplicate DREQ handling.
However, because the timeouts are so long, it impacts the
ability to execute the test to collect connection setup
timings.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
A subsequent patch will enable multiple threads to
update the completed counters in parallel.  Switch
to using atomics.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Once multi-thread support is added to cmtime, the use_qpn
and disc_events variables will be access by multiple threads.
Use an atomic variable instead of volatiles.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Add a command line argument to set the number of threads
assigned to a work queue.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
When multiple work items are added to the work queue, it's
possible that only a single thread is signaled to process
the work.  The result is that the other threads continue
to wait on the condition.  Since the test inserts work using
a loop, this occurs frequently, and the behavior appears as
a singled thread app, even if multiple threads were
requested.

After removing a work item, if other work items remain,
signal the condition to wake up another thread.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Remove : to make it easier to import the data into excel.
Move the avg/iter forward, since it's the more useful value
to examine.  Include headers with some test details.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
To provide a reasonable baseline, add an option to connect over
tcp sockets instead of RDMA.  (The author was unable to find a
suitable connection rate test for sockets.  However, since this
is just to provide something to compare against, the code is
fairly simple.)

Signed-off-by: Sean Hefty <shefty@nvidia.com>
@shefty
Copy link
Contributor Author

shefty commented Apr 25, 2024

Added a patch that tests tcp socket connection rate, which can be used as a baseline.

@rleon rleon merged commit de9dc13 into linux-rdma:master May 8, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants