LibMP is a lightweight messaging library built on top of LibGDSync APIs, developed as a technology demonstrator to easily deploy the GPUDirect Async technology in applications. Main LibMP features are:
- Thin layer on top of IB Verbs, LibGDSync
- MPI out-of-band mechanism to distribute the process info order to establish IB connections
- MPI is never used during actual communications
- Point-to-point and one-sided communications, no collectives
- No tags, no wildcards, no data types
- Can easily combine GPUDirect Async with GPUDirect RDMA
Basic LibMP requirements are:
- OpenMPI v1.10 or newer
- Mellanox OFED (MOFED) 4.0 or newer
- Mellanox Connect-IB, ConnectX-4 HCAs or newer
- LibGDSync
To use GPUDirect Async in combination with GPUDirect RDMA:
- OpenMPI with CUDA support
- A recent CUDA Toolkit is required, minimally 8.0
- A recent display driver, i.e. r361, r367 or later, is required
- The Mellanox OFED GPUDirect RDMA kernel module, https://github.com/Mellanox/nv_peer_memory, is required to allow the HCA to access the GPU memory.
Use the scripts/env_setup.sh file to specify MPI_PATH, CUDA_PATH, LIBGDSYNC_PATH and LIBMP_PATH env vars useful for both LibMP and LibGDSync.
Use the build.sh script to build LibMP.
In scripts folder:
- wrapper.sh: sample script with some topology example
- test.sh: sample script to test all libmp examples and benchmarks
You need to create your own hostfile inside scripts directory
COMM is an additional library built on top of LibMP. With COMM you can easily deploy LibMP in you applications; the pingpong is an example of COMM usage.
We created a new repository here in order to collect in a single project all the components of the GPUDirect Async technology. In this repo you can find several scripts useful to configure, build and run all the GPUDirect Async libraries, tests, benchmarks and examples.
If you find this software useful in your work, please cite:
"GPUDirect Async: exploring GPU synchronous communication techniques for InfiniBand clusters", E. Agostini, D. Rossetti, S. Potluri. Journal of Parallel and Distributed Computing, Vol. 114, Pages 28-45, April 2018
"Offloading communication control logic in GPU accelerated applications", E. Agostini, D. Rossetti, S. Potluri. Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’ 17), IEEE Conference Publications, Pages 248-257, Nov 2016