-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Memory Tracer #4181
Add Memory Tracer #4181
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #4181 +/- ##
===========================================
+ Coverage 33.95% 36.35% +2.40%
===========================================
Files 405 419 +14
Lines 56841 59168 +2327
===========================================
+ Hits 19302 21513 +2211
- Misses 37539 37655 +116
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
||
if self.paddle is not None: | ||
# self.torch.cuda.reset_peak_memory_stats()? | ||
self.paddle.device.cuda.empty_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个API是不是没有?
self.torch.cuda.reset_peak_memory_stats()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,我这里查了一下,没找见
还有几个问题:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
注意修改还原 |
@@ -132,7 +117,6 @@ def compute_metrics(p): | |||
preds = paddle.to_tensor(preds) | |||
label = paddle.to_tensor(p.label_ids) | |||
|
|||
probs = F.softmax(preds, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一行为何删掉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit时候自动报错的,没用上这个变量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -493,6 +493,9 @@ class TrainingArguments: | |||
default=None, | |||
metadata={"help": "The path to a folder with a valid checkpoint for your model."}, | |||
) | |||
skip_memory_metrics: bool = field( | |||
default=True, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."} | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里加上中文使用文档。
PR types
New features
PR changes
Others
Description
New feature: add trainer memory tracer
A helper class that tracks cpu and gpu memory.
When a stage completes, it can pass metrics dict to update with the memory metrics gathered during this stage.