-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data race detected in libfabric shm provider with multi-threaded client-server setup #10528
Comments
@piotrchmiel Thanks for reporting! Any chance you have an existing reproducer you could send so I don't have to try to implement it? |
@aingerson I prepared reproducer:
I'm using clang19.1.2 (https://github.com/llvm/llvm-project/tree/llvmorg-19.1.2) Thread sanitizer logs from reproducer:
Reproducer shows also another issue that appears more often:
|
Describe the bug
When using the shm provider in a simple setup with one client thread and one server thread within a single process, a data race is detected when compiled with clang-19 and run with ThreadSanitizer. The client performs one fi_send, and the server performsone fi_recv, with a message size of 1000 bytes. The data race appears during the fi_cq_read operation on the server side and the fi_send operation on the client side.
To Reproduce
Observe that ThreadSanitizer reports a data race during execution
Expected behavior
No data race should be detected when performing simple send and receive operations between a client and server thread within the same process using the shm provider.
Output
WARNING: ThreadSanitizer: data race (pid=543759)
Server thread:
Read of size 8 at 0x7febb28f3000 by thread T4 (mutexes: write M0, write M1, write M2):
#0 strncmp /home/piotrchmiel/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:487:3 (test+0x9d66d)
#1 smr_name_compare /home/piotrchmiel/test/third_party/libfabric/prov/shm/src/smr_util.c:351:9 (libfabric.so.1+0xe5a33)
Client thread:
Previous write of size 8 at 0x7febb28f3000 by main thread:
#0 memcpy /home/piotrchmiel/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115:5 (test+0x8e9de)
#1 smr_send_name /home/piotrchmiel/test/third_party/libfabric/prov/shm/src/smr_ep.c:206:2 (libfabric.so.1+0xdcfcd)
Environment:
Additional context
The data race occurs specifically on memory access in smr_name_compare (during fi_cq_read on the server side) and in smr_send_name (during fi_send on the client side)
The text was updated successfully, but these errors were encountered: