-
Notifications
You must be signed in to change notification settings - Fork 826
Issues: NVIDIA/nccl
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
local access violation work queue error when upgrade to v2.20.3-1
#1524
opened Nov 26, 2024 by
gangxie112
Why group calls (
ncclGroupStart()
and ncclGroupEnd()
) are invoked in ncclSend()
and ncclRecv()
#1521
opened Nov 21, 2024 by
ZhiyiHu1999
Is it safe or recommended to use multiple communicators for real distributed training
#1520
opened Nov 19, 2024 by
ZhiyiHu1999
Nccl socketStartConnect: Connect to x.x.x.x<xxxx> failed : Software caused connection abort
#1515
opened Nov 16, 2024 by
913871734
torch.distributed.DistBackendError: NCCL error in ProcessGroupNCCL.cpp:1275
#1514
opened Nov 14, 2024 by
shenshaowei
Difference between readLL() and readLLFinish() in prims_ll.h
#1503
opened Oct 31, 2024 by
ZhiyiHu1999
Alternating rings cause bad performance (NIC sending PFC) in a cluster with mixed crossNic=0/1 nodes
#1494
opened Oct 25, 2024 by
huzhiwen93
Previous Next
ProTip!
Follow long discussions with comments:>50.