We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug fi_rdm_tagged_peek has a race condition cleanup error where the process segmentation faults when trying to close the endpoint
To Reproduce Build with UCX server_cmd: fi_rdm_tagged_peek -p ucx -E client_cmd: fi_rdm_tagged_peek -p ucx -E server_address
Expected behavior Test passes successfuly
Output Server output: server_cmd: /path_to_fabtests_install/fi_rdm_tagged_peek -p "ucx" -E server_stdout: | Sending 10 tagged messages Waiting for messages to complete [node:3176869:0:3176869] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) ==== backtrace (tid:3176869) ==== 0 0x0000000000012cf0 __funlockfile() :0 1 0x0000000000033210 ucp_ep_destroy_base() ???:0 2 0x000000000004b3ee ucp_worker_discard_uct_ep_progress() ???:0 3 0x000000000004b4b5 ucp_worker_destroy() ???:0 4 0x00000000000ca7fa ucx_ep_close() ucx_ep.c:0 5 0x0000000000404081 fi_close() /path_to_libfabric_install/include/rdma/fabric.h:632 6 0x0000000000404081 ft_close_fids() /path_to_libfabric_source/fabtests/common/shared.c:1792 7 0x0000000000404b6a ft_free_res() /path_to_libfabric_source/fabtests/common/shared.c:1862 8 0x0000000000401bfa main() /hpath_to_libfabric_source/fabtests/functional/rdm_tagged_peek.c:364 9 0x0000000000401bfa main() /path_to_libfabric_source/fabtests/functional/rdm_tagged_peek.c:365 10 0x000000000003ad85 __libc_start_main() ???:0 11 0x000000000040203e _start() ???:0
Client output: client_cmd: /path_to_fabtests_install/fi_rdm_tagged_peek -p "ucx" -E server_address client_stdout: | Peek for a bad msg Peek w/ claim for a bad msg Peek msg 1 Receive msg 1 Peek w/ claim msg 2 Receive claimed msg 2 Peek & discard msg 3 Checking to see if msg 3 was discarded Peek w/ claim msg 4 Claim and discard msg 4 Receive msg 5 Receive msg 6 Receive msg 10 Receive msg 9 Receive msg 8 Receive msg 7
Environment: rocky 8.7
Additional context Fails as a race condition. No known 100% fail case.
The text was updated successfully, but these errors were encountered:
Revert #10124's ucx test disable when this is resolved.
Sorry, something went wrong.
No branches or pull requests
Describe the bug
fi_rdm_tagged_peek has a race condition cleanup error where the process segmentation faults when trying to close the endpoint
To Reproduce
Build with UCX
server_cmd: fi_rdm_tagged_peek -p ucx -E
client_cmd: fi_rdm_tagged_peek -p ucx -E server_address
Expected behavior
Test passes successfuly
Output
Server output:
server_cmd: /path_to_fabtests_install/fi_rdm_tagged_peek -p "ucx" -E
server_stdout: |
Sending 10 tagged messages
Waiting for messages to complete
[node:3176869:0:3176869] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:3176869) ====
0 0x0000000000012cf0 __funlockfile() :0
1 0x0000000000033210 ucp_ep_destroy_base() ???:0
2 0x000000000004b3ee ucp_worker_discard_uct_ep_progress() ???:0
3 0x000000000004b4b5 ucp_worker_destroy() ???:0
4 0x00000000000ca7fa ucx_ep_close() ucx_ep.c:0
5 0x0000000000404081 fi_close() /path_to_libfabric_install/include/rdma/fabric.h:632
6 0x0000000000404081 ft_close_fids() /path_to_libfabric_source/fabtests/common/shared.c:1792
7 0x0000000000404b6a ft_free_res() /path_to_libfabric_source/fabtests/common/shared.c:1862
8 0x0000000000401bfa main() /hpath_to_libfabric_source/fabtests/functional/rdm_tagged_peek.c:364
9 0x0000000000401bfa main() /path_to_libfabric_source/fabtests/functional/rdm_tagged_peek.c:365
10 0x000000000003ad85 __libc_start_main() ???:0
11 0x000000000040203e _start() ???:0
Client output:
client_cmd: /path_to_fabtests_install/fi_rdm_tagged_peek -p "ucx" -E server_address
client_stdout: |
Peek for a bad msg
Peek w/ claim for a bad msg
Peek msg 1
Receive msg 1
Peek w/ claim msg 2
Receive claimed msg 2
Peek & discard msg 3
Checking to see if msg 3 was discarded
Peek w/ claim msg 4
Claim and discard msg 4
Receive msg 5
Receive msg 6
Receive msg 10
Receive msg 9
Receive msg 8
Receive msg 7
Environment:
rocky 8.7
Additional context
Fails as a race condition. No known 100% fail case.
The text was updated successfully, but these errors were encountered: