Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDT shows message queues with pml ob1 but not with ucx and yalla. #6464

Open
bartoldeman opened this issue Mar 5, 2019 · 11 comments
Open

DDT shows message queues with pml ob1 but not with ucx and yalla. #6464

bartoldeman opened this issue Mar 5, 2019 · 11 comments

Comments

@bartoldeman
Copy link

bartoldeman commented Mar 5, 2019

Background information

We found that using the yalla and ucx pml, message queue displays in DDT no longer works, and we need to fall back to the ob1 pml. This seems to be an ok workaround for now but with newer OpenMPI's removing the openib btl, does that mean we will need to use TCP/IP for debugging, or perhaps the new still somewhat experimental UCT BTL?

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Tested with OpenMPI 2.1.1 (patched to work with DDT in general) with the yalla and ob1 pml, 3.1.1 and 3.1.2 with ucx and ob1 pml. Tested with DDT (Arm Forge) 7.1, 18.2, 18.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

It was compiled from a source tarball.

Please describe the system on which you are running

  • Operating system/version: CentOS 7.4
  • Computer hardware: x86_64 Skylake SP and Broadwell.
  • Network type: Infiniband Mellanox ConnectX-5.

Details of the problem

We expect to see what is show in the first screenshot (with ob1) but see the second with ucx and yalla.
The test case is a simple MPI deadlock program compiled using mpicc -g deadlock_ring.c -o deadlock_ring. The compiler (used GCC 5.4.0, 7.3.0, Intel 2016 update 4) does not matter.

screen shot 2019-03-05 at 15 05 41
screen shot 2019-03-05 at 15 00 30

/******************************************************************************
Complex deadlock bug (loop over all ranks).

Solutions:
  MPI_Sendrecv(Arr, N, MPI_INT, rank_next, tag, Arr, N, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &status);
  MPI_Sendrecv_replace(Arr, N, MPI_INT, rank_next, tag, rank_prev, tag, MPI_COMM_WORLD, &status);

******************************************************************************/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

// Try 1 and 10000
#define N 10000

int main (int argc, char *argv[])
{
  int numtasks, rank, tag=0, rank_prev, rank_next;
  int Arr[N];
  MPI_Status status;

  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  printf("Task %d starting...\n",rank);

  // Neighboring tasks:
  rank_prev = rank - 1;
  rank_next = rank + 1;
  // Imposing periodic boundaries on ranks:
  if (rank_prev < 0)
    rank_prev = numtasks - 1;
  if (rank_next == numtasks)
    rank_next = 0;


  MPI_Ssend(Arr, N, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
  MPI_Recv(Arr, N, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &status);


  printf ("Finished\n");

  MPI_Finalize();
}
@ggouaillardet
Copy link
Contributor

Thanks for the report.

The message queues are accessed by the debugger via the mca_pml_base_send_requests and mca_pml_base_recv_requests free lists. These are correctly used by pml/ob1 and pml/cm, but pml/ucx and pml/yalla simply ignore them, and that is why DDT only show empty message queues.

@jladd-mlnx @yosefe is this an intentional design or something that was simply overlooked ?

@yosefe
Copy link
Contributor

yosefe commented Mar 6, 2019

@ggouaillardet mca_pml_base_request_t has some overhead (for memory consumption and initialization) which we wanted to avoid with MXM and UCX, we needed only the base ompi_request_t.

@yosefe
Copy link
Contributor

yosefe commented Mar 6, 2019

@bartoldeman @ggouaillardet are there some DDT hooks/macros we can add to pml ucx to make it work with DDT, without using the PML base lists and structures?

@ggouaillardet
Copy link
Contributor

DDT uses the dll in ompi/debuggers in order to access the internals of a MPI app including the message queues. So you can extend it in order to access the UCX internals. That would be quite some work IMHO.

@bartoldeman
Copy link
Author

@yosefe maybe mca_pml_base_request_t could be used only if some non-default mca parameter is set, in order to avoid the overhead for the common non-debugged case?

@yosefe
Copy link
Contributor

yosefe commented Mar 8, 2019

@bartoldeman yes, this might be easier. we will check this out.

@yosefe yosefe self-assigned this Mar 8, 2019
@rhc54
Copy link
Contributor

rhc54 commented Mar 8, 2019

Just an FYI: a rather common use of DDT is to attach to a running program that encountered an issue.

@gpaulsen
Copy link
Member

gpaulsen commented Jul 1, 2021

@yosefe Is this still an issue with newer UCX on master?

@Ajax-Light
Copy link

Hi, is this still open? I'd like to take a shot at this

@jsquyres
Copy link
Member

Yes, it's still open.

@gpaulsen
Copy link
Member

gpaulsen commented Feb 1, 2022

@Ajax-Light As you investigate this, please remember that Open MPI v5.0.x and master branches no longer have the MPIR debugger interface. I'm not sure if DDT yes supports the PMIx Tools debugger interface, but if it doesn't, please take a look at the SHIM library https://github.com/openpmix/mpir-to-pmix-guide to allow parallel debuggers which support MPIR to work with the PMIx Tools debugger interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants