Pinned Memory for Parallel SpMV #553

PaulMullowney · 2022-01-07T16:31:24Z

This PR adds pinned memory for data transfer during a parallel SpMV. I think this would be a good time to make a par_csr_matvec_device.c and include device code there. It would make this easier to work with rather than all the #defs.

Pinned pointers are added to par_csr_matrix class. These pointers are allocated on demand in par_csr_matvec.c. They are sized according to a maximum in order to be reusable in both the standard matvec and matvecT. To me, they belong in the matrix class because the data is sized according to a particular matrix's parallel SpMV layout.
The pinned memory buffers are allocated in memory.c with cudaHostAlloc with cudaHostAllocMapped. This enables one to write directly into pinned memory from the cuda kernels (either gather (matvec) or spmv (matvecT)). One has to do cudaHostGetDevicePointer and pass that into the kernel execution (done in par_csr_matvec.c). This is a little wonky and needs to be cleaned up.
par_csr_communication has 2 new methods : hypre_ParCSRCommHandleCreate_v3, hypre_ParCSRCommHandleDestroy_v3

In the first method, we have the pinned buffers passed as input. Rather than execute a memcpyDtoH, I simply device synchronize to ensure the pinned data is ready on the host for MPI communication.

In the second method, I execute a cudaMemcpyAsync in order to allow data transfer execute in parallel overlapped kernel execution. cudaMemcpyAsync is called via hypre_TMemcpyAsync. This is a new method and is only implemented for Nvidia and AMD architectures. Tested on Summit and Crusher.

…ssfully.

PaulMullowney requested a review from liruipeng January 7, 2022 16:31

PaulMullowney force-pushed the matvec_pinned_comm branch from ef08c18 to 7c165a7 Compare January 11, 2022 16:16

PaulMullowney force-pushed the matvec_pinned_comm branch 2 times, most recently from 72cdbe8 to ebc785e Compare February 8, 2022 15:27

PaulMullowney force-pushed the matvec_pinned_comm branch from ebc785e to 3f7aa4b Compare February 14, 2022 19:19

PaulMullowney force-pushed the matvec_pinned_comm branch from 3f7aa4b to 57fd980 Compare March 6, 2022 16:05

PaulMullowney force-pushed the matvec_pinned_comm branch 2 times, most recently from b1b69a5 to 126569f Compare March 20, 2022 18:59

PaulMullowney force-pushed the matvec_pinned_comm branch from 126569f to 6a8e596 Compare March 30, 2022 15:33

PaulMullowney force-pushed the matvec_pinned_comm branch from 6a8e596 to 063950c Compare April 21, 2022 15:30

PaulMullowney force-pushed the matvec_pinned_comm branch 3 times, most recently from 063950c to 86a0485 Compare May 12, 2022 17:59

PaulMullowney force-pushed the matvec_pinned_comm branch 2 times, most recently from cd9f294 to a67c023 Compare June 3, 2022 16:27

First pass at using pinned memory for matvecs. Cuda and Hip run succe…

74b1147

…ssfully.

PaulMullowney force-pushed the matvec_pinned_comm branch from a67c023 to 74b1147 Compare June 23, 2022 16:41

liruipeng changed the base branch from master to matvec_pinned_comm June 24, 2022 17:49

liruipeng merged commit 5d8f6af into hypre-space:matvec_pinned_comm Jun 24, 2022

liruipeng mentioned this pull request Jun 24, 2022

Matvec pinned comm #663

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinned Memory for Parallel SpMV #553

Pinned Memory for Parallel SpMV #553

PaulMullowney commented Jan 7, 2022 •

edited

Loading

Pinned Memory for Parallel SpMV #553

Pinned Memory for Parallel SpMV #553

Conversation

PaulMullowney commented Jan 7, 2022 • edited Loading

PaulMullowney commented Jan 7, 2022 •

edited

Loading