-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDT shows message queues with pml ob1 but not with ucx and yalla. #6464
Comments
Thanks for the report. The message queues are accessed by the debugger via the @jladd-mlnx @yosefe is this an intentional design or something that was simply overlooked ? |
@ggouaillardet mca_pml_base_request_t has some overhead (for memory consumption and initialization) which we wanted to avoid with MXM and UCX, we needed only the base |
@bartoldeman @ggouaillardet are there some DDT hooks/macros we can add to pml ucx to make it work with DDT, without using the PML base lists and structures? |
DDT uses the dll in |
@yosefe maybe |
@bartoldeman yes, this might be easier. we will check this out. |
Just an FYI: a rather common use of DDT is to attach to a running program that encountered an issue. |
@yosefe Is this still an issue with newer UCX on master? |
Hi, is this still open? I'd like to take a shot at this |
Yes, it's still open. |
@Ajax-Light As you investigate this, please remember that Open MPI |
Background information
We found that using the yalla and ucx pml, message queue displays in DDT no longer works, and we need to fall back to the ob1 pml. This seems to be an ok workaround for now but with newer OpenMPI's removing the openib btl, does that mean we will need to use TCP/IP for debugging, or perhaps the new still somewhat experimental UCT BTL?
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
Tested with OpenMPI 2.1.1 (patched to work with DDT in general) with the yalla and ob1 pml, 3.1.1 and 3.1.2 with ucx and ob1 pml. Tested with DDT (Arm Forge) 7.1, 18.2, 18.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
It was compiled from a source tarball.
Please describe the system on which you are running
Details of the problem
We expect to see what is show in the first screenshot (with ob1) but see the second with ucx and yalla.
The test case is a simple MPI deadlock program compiled using
mpicc -g deadlock_ring.c -o deadlock_ring
. The compiler (used GCC 5.4.0, 7.3.0, Intel 2016 update 4) does not matter.The text was updated successfully, but these errors were encountered: