-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[unified checkpoint] Fix last checkpoint save #7810
[unified checkpoint] Fix last checkpoint save #7810
Conversation
Thanks for your contribution! |
paddlenlp/trainer/trainer.py
Outdated
unified_checkpoint_config_backup = self.args.unified_checkpoint_config | ||
# backup and remove unified_checkpoint_config for not trine stage | ||
if not self.is_in_train: | ||
self.args.unified_checkpoint_config = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
处理完的 unified_checkpoint_config 是不是下面行形式,这里设为str,也是没问题的?
unified_checkpoint_config : ['skip_save_model_weight', 'master_weight_compatible', 'async_save']
@@ -2027,8 +2027,17 @@ def save_model(self, output_dir: Optional[str] = None, merge_tensor_parallel: Op | |||
self.model_wrapped.get_all_parameters(convert2cpu=True) | |||
|
|||
if self.args.should_save_model_state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里筛选了 save 的worker,不影响是不是。
我记得load的时候可能放开了这个限制。
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## release/2.7 #7810 +/- ##
===============================================
- Coverage 57.30% 57.30% -0.01%
===============================================
Files 584 584
Lines 87692 87708 +16
===============================================
+ Hits 50254 50260 +6
- Misses 37438 37448 +10 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Bug fixes
PR changes
Others
Description
Fix last checkpoint save.