You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if I understand correctly, sendbuff and recvbuff are size of the transfered data for allreduce, and count value is the number of float data units.
in the ring allreudce, I think, we have to transfer data n-1 times per each GPU.
so the whole data leaving each gpu is (n-1)countfloat = (n-1)*sendbuff=(n-1)*recvbuff?
and then in the formula (algbw = S/t), the Size, S, is same as the count or # of GPUs * count * float?
I'm studying about the NCCL, and I am wondering what is the the count value in allreduce function.
ncclResult_tncclAllReduce(const void* sendbuff, void* recvbuff, size_t count, ncclDataType_t datatype, ncclRedOp_t op, ncclComm_t comm, cudaStream_t stream)
if I understand correctly, sendbuff and recvbuff are size of the transfered data for allreduce, and count value is the number of float data units.
in the ring allreudce, I think, we have to transfer data n-1 times per each GPU.
so the whole data leaving each gpu is (n-1)countfloat = (n-1)*sendbuff=(n-1)*recvbuff?
and then in the formula (algbw = S/t), the Size, S, is same as the count or # of GPUs * count * float?
https://github.com/NVIDIA/nccl-tests/blob/master/doc/PERFORMANCE.md#bandwidth
The text was updated successfully, but these errors were encountered: