[Distributed] enable tensor_parallel_output for finetuning #8370

SylarTiaNII · 2024-05-07T03:42:25Z

PR types

Bug fixes(Performance optimization)

PR changes

Others

Description

enable tensor_parallel_output as default for better performance

paddle-bot · 2024-05-07T03:42:30Z

Thanks for your contribution!

codecov · 2024-05-07T04:11:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.36%. Comparing base (9146c1e) to head (88b1da4).
Report is 2 commits behind head on develop.

❗ Current head 88b1da4 differs from pull request most recent head 176891c. Consider uploading reports for the commit 176891c to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8370      +/-   ##
===========================================
- Coverage    55.43%   55.36%   -0.07%     
===========================================
  Files          616      614       -2     
  Lines        96229    96016     -213     
===========================================
- Hits         53346    53164     -182     
+ Misses       42883    42852      -31

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wawltor · 2024-05-07T08:17:44Z

llm/finetune_generation.py

@@ -152,7 +152,7 @@ def main():
            # NOTE(gongenlei): new add autotuner_benchmark
            model_config = AutoConfig.from_pretrained(
                model_args.model_name_or_path,
-                tensor_parallel_output=False,
+                tensor_parallel_output=True,


tensor_parallel_output=True主要是为了加速吗？

tensor_parallel_output=True 设置为True会导致模型指标ACC计算出错，因为没有对结果进行all gather操作

如果不设置为True的话，一个是影响性能，再一个是影响显存占用。在llm场景会有比较大的性能影响。模型指标ACC的计算是不是可以考虑做一下相应的优化来适配mp场景？

这里可以加开关，建议默认还是False，generation 那里还没有适配，会有问题。

ZHUI · 2024-05-10T12:04:34Z

llm/finetune_generation.py

@@ -152,7 +152,7 @@ def main():
            # NOTE(gongenlei): new add autotuner_benchmark
            model_config = AutoConfig.from_pretrained(
                model_args.model_name_or_path,
-                tensor_parallel_output=False,
+                tensor_parallel_output=True,


这里可以加开关，建议默认还是False，generation 那里还没有适配，会有问题。

ZHUI · 2024-05-10T12:06:43Z

paddlenlp/trainer/trainer.py

@@ -2780,6 +2780,12 @@ def evaluation_loop(

        # Metrics!
        if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
+            if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape:


Suggested change

if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape:

if self.args.tensor_parallel_degree > 1 and isinstance(all_preds, paddle.Tensor) all_preds.shape != all_labels.shape:

然后这里加一个注释吧，all_gather logits for tp

lugimzzz · 2024-05-10T12:07:30Z

paddlenlp/trainer/trainer.py

@@ -2780,6 +2780,12 @@ def evaluation_loop(

        # Metrics!
        if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
+            if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape:


应该在https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/utils.py#L208 CausalLMTrainer这里对logit加上all gather的操作，而不是在这

wawltor

LGTM

* [XPU] llama add xpu support (#8282) * [XPU] llama add xpu support * fix * use try import * fix * refine * refine * refine * refine * update (#8399) * [LLM] Support fuse attention q, k, v weights (#8202) 1. add use-interface & fuse action 1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch * 3.6.1 solve shard branch after rebase develop * code clean * remove debug comment * Redefine fuse and split functions * Redefine fuse and split functions * comment and fix * update method * update QKV fuse and split * support fuse weights in multi-files * add precision compare * simplify function call * support use_fast_ffn * clean modeling and configuration * add test for gpt and opt * fix tp_actions get * add fast_ffn test * add Qwen2Moe * Revert "add Qwen2Moe" This reverts commit 113b883. * add test for split * update doc * update filter_dict_keys --------- Co-authored-by: Zii <ziangqin.baidu@gmail.com> * [LLM] Fix fuse or split with same key (#8378) * fix fuse or split with same key * fix * fix eps * update format * [LLM] add decay steps option for finetuning (#8251) * [LLM] add memory stats to logger of trainer (#8269) * [Distributed] fix lora (#8325) * [LLM] fix lora target modules on llama (#8372) * [Distributed] metric calculation supports tp logits (#8370) * Update model_utils.py * Update model_utils.py * Update model_utils.py --------- Co-authored-by: Jianbang Yang <yangjianbang112@gmail.com> Co-authored-by: DrownFish19 <DrownFish19@gmail.com> Co-authored-by: Zii <ziangqin.baidu@gmail.com> Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com>

wawltor reviewed May 7, 2024

View reviewed changes

SylarTiaNII force-pushed the enable_tensor_parallel_output branch 5 times, most recently from 8fd9ff9 to d162d0c Compare May 10, 2024 11:32

ZHUI reviewed May 10, 2024

View reviewed changes

lugimzzz reviewed May 10, 2024

View reviewed changes

SylarTiaNII force-pushed the enable_tensor_parallel_output branch 2 times, most recently from 9a8b420 to eaf6453 Compare May 10, 2024 12:36

[Distributed] metric calculation supports tp logits

176891c

SylarTiaNII force-pushed the enable_tensor_parallel_output branch from eaf6453 to 176891c Compare May 10, 2024 12:38

wawltor approved these changes May 10, 2024

View reviewed changes

wawltor merged commit c6e5459 into PaddlePaddle:develop May 10, 2024
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Distributed] enable tensor_parallel_output for finetuning #8370

[Distributed] enable tensor_parallel_output for finetuning #8370

SylarTiaNII commented May 7, 2024

paddle-bot bot commented May 7, 2024

codecov bot commented May 7, 2024 •

edited

Loading

wawltor May 7, 2024 •

edited

Loading

SylarTiaNII May 7, 2024

ZHUI May 10, 2024

ZHUI May 10, 2024

ZHUI May 10, 2024

lugimzzz May 10, 2024

wawltor left a comment

	if self.args.tensor_parallel_degree > 1 and all_preds.shape != all_labels.shape:
	if self.args.tensor_parallel_degree > 1 and isinstance(all_preds, paddle.Tensor) all_preds.shape != all_labels.shape:

[Distributed] enable tensor_parallel_output for finetuning #8370

[Distributed] enable tensor_parallel_output for finetuning #8370

Conversation

SylarTiaNII commented May 7, 2024

PR types

PR changes

Description

paddle-bot bot commented May 7, 2024

codecov bot commented May 7, 2024 • edited Loading

Codecov Report

wawltor May 7, 2024 • edited Loading

Choose a reason for hiding this comment

SylarTiaNII May 7, 2024

Choose a reason for hiding this comment

ZHUI May 10, 2024

Choose a reason for hiding this comment

ZHUI May 10, 2024

Choose a reason for hiding this comment

ZHUI May 10, 2024

Choose a reason for hiding this comment

lugimzzz May 10, 2024

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

codecov bot commented May 7, 2024 •

edited

Loading

wawltor May 7, 2024 •

edited

Loading