-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MoE] fix expert parallel #9760
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9760 +/- ##
===========================================
- Coverage 52.70% 52.38% -0.33%
===========================================
Files 731 727 -4
Lines 117313 115146 -2167
===========================================
- Hits 61827 60316 -1511
+ Misses 55486 54830 -656 ☔ View full report in Codecov by Sentry. |
@@ -152,6 +152,9 @@ def main(): | |||
quantization_config=quantization_config, | |||
) | |||
|
|||
if "Qwen2Moe" in str(model_config.architectures) and training_args.data_parallel_degree > 1: | |||
training_args.use_expert_parallel = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不太好吧,万一用户的dp跟 expert_parallel degree对不上怎么办?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moe layer写法就默认了是这个逻辑,expert_parallel_degree=dp_degree。觉得不好的话我把原来的逻辑改掉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我感觉只能默认和data_parallel_degree进行对齐,做all-to-all时是应该在数据并行组内进行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Bug fixes
PR changes
Description
Fix MoE model using expert parallel.