请教，shardfomer中GPT2FusedLinearConv1D_Col为什么反向做了两次allreduce #4961

lichenlu · 2023-10-23T12:28:11Z

lichenlu
Oct 23, 2023

如图，GPT2FusedLinearConv1D_Col的forward函数中使用了两个function，reduce_backward和matmul_with_async_comm，这两个函数在backward的时候都进行了allreduce操作，这里是否发生了冗余？

kurisusnowdeng · 2023-10-24T06:56:06Z

kurisusnowdeng
Oct 24, 2023
Maintainer

@lichenlu TP的时候，是在column parallel layer中，input在forward时不做任何操作在backward时reduce梯度，相对地，在row parallel layer中output在forward时需要reduce，在backward时不做任何操作。也就是说，像reduce_backward这样的有些方法，其实它们的forward或是backward只是一个空操作，这是通过torch.autograd.Function来实现的

1 reply

lichenlu Oct 24, 2023
Author

现在reduce_backward中在反向的时候有一次allreduce操作（前向没有任何操作），但是matmul_with_async_comm的backward中仍然有一次allreduce操作，我的疑问是这两次allreduce是否重复

kurisusnowdeng · 2023-10-25T04:35:18Z

kurisusnowdeng
Oct 25, 2023
Maintainer

@FrankLeeeee 这个细节能帮忙解释一些么

0 replies

flybird11111 · 2023-10-25T05:18:43Z

flybird11111
Oct 25, 2023
Collaborator

在GPT2FusedLinearConv1D_Col中，ctx.async_grad_allreduce默认值为False，所以matmul_with_async_comm的allreduce没有被执行，总共还是只执行了一次。

2 replies

lichenlu Oct 25, 2023
Author

哦，谢谢，确实默认是false的，是我没有看仔细。不过理论上这里其实也可以使用async allreduce？

flybird11111 Oct 25, 2023
Collaborator

嗯嗯，是的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请教，shardfomer中GPT2FusedLinearConv1D_Col为什么反向做了两次allreduce #4961

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

请教，shardfomer中GPT2FusedLinearConv1D_Col为什么反向做了两次allreduce #4961

lichenlu Oct 23, 2023

Replies: 3 comments · 3 replies

kurisusnowdeng Oct 24, 2023 Maintainer

lichenlu Oct 24, 2023 Author

kurisusnowdeng Oct 25, 2023 Maintainer

flybird11111 Oct 25, 2023 Collaborator

lichenlu Oct 25, 2023 Author

flybird11111 Oct 25, 2023 Collaborator

lichenlu
Oct 23, 2023

Replies: 3 comments 3 replies

kurisusnowdeng
Oct 24, 2023
Maintainer

lichenlu Oct 24, 2023
Author

kurisusnowdeng
Oct 25, 2023
Maintainer

flybird11111
Oct 25, 2023
Collaborator

lichenlu Oct 25, 2023
Author

flybird11111 Oct 25, 2023
Collaborator