How the memory copy works between CPU and GPU? #688

ihchoi12 · 2022-05-31T13:05:19Z

Hello, I want to understand how the memory copy works between CPU and GPU.

CPU -> GPU memory copy (e.g., CPU moves data to GPU) is triggered by cudaMemcpy() called by CPU.
From my Nsight systems profiling, I found that no GPU kernel is launched for this, meaning that it is purely processed by CPU.
I wonder how CPU can write to GPU memory w/o GPU involvement. Does the CPU perform PCIe memory write transaction for this?
GPU -> CPU memory copy (e.g., GPU moves gradients to CPU to perform inter-node Allreduce) is triggered by NCCL.
I saw (in NCCL memcpy time #213) that the NCCL kernels perform store/load operations to the host memory. Does it mean that the GPU performs those operations directly to the host memory? Does GPU have direct access to the host memory? It is a PCIe memory write transaction as well? Or is it a DMA operation?

Any comments much appreciated!

sjeaugey · 2022-06-01T08:01:31Z

NCCL does not use cudaMemcpy aside from the initial setup (ncclCommInit*).
Most of the time, the memory shared by the CPU and GPU is in CPU memory, and we then register it on the GPU using cudaHostRegister or allocating it with cudaHostAlloc. That way both the CPU and GPU can access that memory directly using load/stores.

If we have the GDRCopy module loaded, we can locate some buffers in GPU memory, and the CPU will be able to access it directly using load/stores.

Regarding network communication, there is no need to move data from GPU to CPU if we have GPU Direct RDMA. The NIC can directly pull data from GPU memory and write data to the destination GPU memory as well. If GPU Direct RDMA is not present (or we don't want to use it), then the GPU will indeed write its data to CPU memory before it is sent.

guanbear · 2022-07-01T10:44:56Z

Hi @sjeaugey，is it possible to use kernel PCI Peer-to-Peer DMA Support （https://www.kernel.org/doc/html/latest/driver-api/pci/p2pdma.html ） , if GPU Direct is not present?

Looking forward to your reply，thanks！

sjeaugey · 2022-07-01T11:44:54Z

GPU Direct relies on PCI Peer-to-peer operations to work. I don't know whether the nvidia driver uses the functions mentioned in the page above but it relies on that same hardware functionality (except when NVLink is present in which case NVLink is preferred).

MonroeD · 2023-11-01T09:12:18Z

NCCL does not use cudaMemcpy aside from the initial setup (ncclCommInit*). Most of the time, the memory shared by the CPU and GPU is in CPU memory, and we then register it on the GPU using cudaHostRegister or allocating it with cudaHostAlloc. That way both the CPU and GPU can access that memory directly using load/stores.

If we have the GDRCopy module loaded, we can locate some buffers in GPU memory, and the CPU will be able to access it directly using load/stores.

Regarding network communication, there is no need to move data from GPU to CPU if we have GPU Direct RDMA. The NIC can directly pull data from GPU memory and write data to the destination GPU memory as well. If GPU Direct RDMA is not present (or we don't want to use it), then the GPU will indeed write its data to CPU memory before it is sent.

I see in the code, the memory register in IB is cudaHostAlloc （am i right?）

  for (int p=0; p<NCCL_NUM_PROTOCOLS; p++) {
    resources->buffers[p] = NCCL_NET_MAP_GET_POINTER(map, cpu, buffs[p]);
    if (resources->buffers[p]) {
      NCCLCHECK(ncclNetRegMr(resources->netSendComm, resources->buffers[p], resources->buffSizes[p], NCCL_NET_MAP_DEV_MEM(map, buffs[p]) ? NCCL_PTR_CUDA : NCCL_PTR_HOST, &resources->mhandles[p]));
    }
  }

lix19937 mentioned this issue Nov 8, 2024

cudaMemcpy consuming CPU resources ? lix19937/tensorrt-insight#58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the memory copy works between CPU and GPU? #688

How the memory copy works between CPU and GPU? #688

ihchoi12 commented May 31, 2022

sjeaugey commented Jun 1, 2022

guanbear commented Jul 1, 2022 •

edited

Loading

sjeaugey commented Jul 1, 2022

MonroeD commented Nov 1, 2023

How the memory copy works between CPU and GPU? #688

How the memory copy works between CPU and GPU? #688

Comments

ihchoi12 commented May 31, 2022

sjeaugey commented Jun 1, 2022

guanbear commented Jul 1, 2022 • edited Loading

sjeaugey commented Jul 1, 2022

MonroeD commented Nov 1, 2023

guanbear commented Jul 1, 2022 •

edited

Loading