what is the count value in allreduce function? #1508

BSkim26 · 2024-11-08T03:17:28Z

I'm studying about the NCCL, and I am wondering what is the the count value in allreduce function.

ncclResult_tncclAllReduce(const void* sendbuff, void* recvbuff, size_t count, ncclDataType_t datatype, ncclRedOp_t op, ncclComm_t comm, cudaStream_t stream)

if I understand correctly, sendbuff and recvbuff are size of the transfered data for allreduce, and count value is the number of float data units.
in the ring allreudce, I think, we have to transfer data n-1 times per each GPU.
so the whole data leaving each gpu is (n-1)countfloat = (n-1)*sendbuff=(n-1)*recvbuff?

and then in the formula (algbw = S/t), the Size, S, is same as the count or # of GPUs * count * float?

https://github.com/NVIDIA/nccl-tests/blob/master/doc/PERFORMANCE.md#bandwidth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the count value in allreduce function? #1508

what is the count value in allreduce function? #1508

BSkim26 commented Nov 8, 2024

what is the count value in allreduce function? #1508

what is the count value in allreduce function? #1508

Comments

BSkim26 commented Nov 8, 2024