Polish sequence parallel to improve performance #861

sneaxiy · 2022-11-01T04:11:24Z

Modification includes:

Replace paddle.split with paddle.slice in the scatter method in sequence_parallel_utils.py.
Remove the transpose operator in the last TransformerBlock output, and add transpose operator to masked_lm_labels and loss_mask. This would decrease the computation cost.

haohongxiang

LGTM

sneaxiy added 2 commits November 1, 2022 12:11

polish sequence parallel

cf672c0

update

2ce9c10

sneaxiy changed the title ~~[WIP] polish sequence parallel~~ Polish sequence parallel to improve performance Nov 1, 2022

Merge branch 'develop' into polish_sequence_parallel

04576b2

haohongxiang approved these changes Nov 1, 2022

View reviewed changes

haohongxiang merged commit cb2b926 into PaddlePaddle:develop Nov 1, 2022

sneaxiy deleted the polish_sequence_parallel branch November 1, 2022 07:53

Provide feedback