-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How the memory copy works between CPU and GPU? #688
Comments
NCCL does not use If we have the GDRCopy module loaded, we can locate some buffers in GPU memory, and the CPU will be able to access it directly using load/stores. Regarding network communication, there is no need to move data from GPU to CPU if we have GPU Direct RDMA. The NIC can directly pull data from GPU memory and write data to the destination GPU memory as well. If GPU Direct RDMA is not present (or we don't want to use it), then the GPU will indeed write its data to CPU memory before it is sent. |
Hi @sjeaugey,is it possible to use kernel PCI Peer-to-Peer DMA Support (https://www.kernel.org/doc/html/latest/driver-api/pci/p2pdma.html ) , if GPU Direct is not present? Looking forward to your reply,thanks! |
GPU Direct relies on PCI Peer-to-peer operations to work. I don't know whether the nvidia driver uses the functions mentioned in the page above but it relies on that same hardware functionality (except when NVLink is present in which case NVLink is preferred). |
I see in the code, the memory register in IB is cudaHostAlloc (am i right?)
|
Hello, I want to understand how the memory copy works between CPU and GPU.
CPU -> GPU memory copy (e.g., CPU moves data to GPU) is triggered by cudaMemcpy() called by CPU.
From my Nsight systems profiling, I found that no GPU kernel is launched for this, meaning that it is purely processed by CPU.
I wonder how CPU can write to GPU memory w/o GPU involvement. Does the CPU perform PCIe memory write transaction for this?
GPU -> CPU memory copy (e.g., GPU moves gradients to CPU to perform inter-node Allreduce) is triggered by NCCL.
I saw (in NCCL memcpy time #213) that the NCCL kernels perform store/load operations to the host memory. Does it mean that the GPU performs those operations directly to the host memory? Does GPU have direct access to the host memory? It is a PCIe memory write transaction as well? Or is it a DMA operation?
Any comments much appreciated!
The text was updated successfully, but these errors were encountered: