[MoE] fix expert parallel #9760

DesmonDay · 2025-01-09T08:47:54Z

PR types

Bug fixes

PR changes

Description

Fix MoE model using expert parallel.

…nto fix_moe

paddle-bot · 2025-01-09T08:48:00Z

Thanks for your contribution!

codecov · 2025-01-09T09:23:42Z

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 52.38%. Comparing base (fb60645) to head (4d4c79e).
Report is 17 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/moe_layer.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9760      +/-   ##
===========================================
- Coverage    52.70%   52.38%   -0.33%     
===========================================
  Files          731      727       -4     
  Lines       117313   115146    -2167     
===========================================
- Hits         61827    60316    -1511     
+ Misses       55486    54830     -656

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2025-01-10T03:33:37Z

llm/run_finetune.py

@@ -152,6 +152,9 @@ def main():
        quantization_config=quantization_config,
    )

+    if "Qwen2Moe" in str(model_config.architectures) and training_args.data_parallel_degree > 1:
+        training_args.use_expert_parallel = True


不太好吧，万一用户的dp跟 expert_parallel degree对不上怎么办？

moe layer写法就默认了是这个逻辑，expert_parallel_degree=dp_degree。觉得不好的话我把原来的逻辑改掉？

我感觉只能默认和data_parallel_degree进行对齐，做all-to-all时是应该在数据并行组内进行

ZHUI

LGTM

ZHUI

LGTM

DesmonDay added 3 commits January 3, 2025 15:48

fix moe uc

8e85839

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

85f84d9

…nto fix_moe

fix moe

4d4c79e

ZHUI reviewed Jan 10, 2025

View reviewed changes

ZHUI approved these changes Jan 15, 2025

View reviewed changes

ZHUI approved these changes Jan 16, 2025

View reviewed changes

ZHUI merged commit 13053a7 into PaddlePaddle:develop Jan 16, 2025
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MoE] fix expert parallel #9760

[MoE] fix expert parallel #9760

DesmonDay commented Jan 9, 2025 •

edited

Loading

paddle-bot bot commented Jan 9, 2025

codecov bot commented Jan 9, 2025 •

edited

Loading

ZHUI Jan 10, 2025

DesmonDay Jan 10, 2025 •

edited

Loading

DrownFish19 Jan 10, 2025

ZHUI left a comment

ZHUI left a comment

[MoE] fix expert parallel #9760

[MoE] fix expert parallel #9760

Conversation

DesmonDay commented Jan 9, 2025 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jan 9, 2025

codecov bot commented Jan 9, 2025 • edited Loading

Codecov Report

ZHUI Jan 10, 2025

Choose a reason for hiding this comment

DesmonDay Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

DrownFish19 Jan 10, 2025

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

DesmonDay commented Jan 9, 2025 •

edited

Loading

codecov bot commented Jan 9, 2025 •

edited

Loading

DesmonDay Jan 10, 2025 •

edited

Loading