Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Memory Tracer #4181

Merged
merged 5 commits into from
Dec 28, 2022
Merged

Add Memory Tracer #4181

merged 5 commits into from
Dec 28, 2022

Conversation

ymyjl
Copy link
Contributor

@ymyjl ymyjl commented Dec 20, 2022

PR types

New features

PR changes

Others

Description

New feature: add trainer memory tracer
A helper class that tracks cpu and gpu memory.
When a stage completes, it can pass metrics dict to update with the memory metrics gathered during this stage.

@paddle-bot
Copy link

paddle-bot bot commented Dec 20, 2022

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Dec 20, 2022

Codecov Report

Merging #4181 (00cb04c) into develop (ec30226) will increase coverage by 2.40%.
The diff coverage is 16.36%.

@@             Coverage Diff             @@
##           develop    #4181      +/-   ##
===========================================
+ Coverage    33.95%   36.35%   +2.40%     
===========================================
  Files          405      419      +14     
  Lines        56841    59168    +2327     
===========================================
+ Hits         19302    21513    +2211     
- Misses       37539    37655     +116     
Impacted Files Coverage Δ
paddlenlp/trainer/trainer.py 11.24% <0.00%> (-0.24%) ⬇️
paddlenlp/trainer/trainer_utils.py 29.58% <15.38%> (-5.24%) ⬇️
paddlenlp/utils/import_utils.py 80.82% <33.33%> (+38.63%) ⬆️
paddlenlp/trainer/training_args.py 40.30% <100.00%> (+0.22%) ⬆️
paddlenlp/__init__.py 19.76% <0.00%> (-10.54%) ⬇️
paddlenlp/transformers/auto/modeling.py 71.88% <0.00%> (-4.98%) ⬇️
paddlenlp/experimental/ernie_model.py 32.43% <0.00%> (-0.91%) ⬇️
paddlenlp/transformers/model_utils.py 73.10% <0.00%> (-0.22%) ⬇️
paddlenlp/transformers/feature_extraction_utils.py 27.02% <0.00%> (-0.19%) ⬇️
... and 42 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ymyjl ymyjl changed the title Yj paddle Add Memory Tracer Dec 20, 2022

if self.paddle is not None:
# self.torch.cuda.reset_peak_memory_stats()?
self.paddle.device.cuda.empty_cache()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个API是不是没有?

self.torch.cuda.reset_peak_memory_stats()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,我这里查了一下,没找见

@ZHUI
Copy link
Collaborator

ZHUI commented Dec 22, 2022

还有几个问题:

  1. paddle缺失的一些API看能否罗列一下
  2. 多卡情况下是只监控一张卡吗?
  3. 纯CPU情况下有试过吗?

ZHUI
ZHUI previously approved these changes Dec 27, 2022
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI
Copy link
Collaborator

ZHUI commented Dec 27, 2022

注意修改还原 run_seq_cls.py

@@ -132,7 +117,6 @@ def compute_metrics(p):
preds = paddle.to_tensor(preds)
label = paddle.to_tensor(p.label_ids)

probs = F.softmax(preds, axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行为何删掉?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit时候自动报错的,没用上这个变量

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -493,6 +493,9 @@ class TrainingArguments:
default=None,
metadata={"help": "The path to a folder with a valid checkpoint for your model."},
)
skip_memory_metrics: bool = field(
default=True, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加上中文使用文档。

@ZHUI ZHUI merged commit 70ca8f8 into PaddlePaddle:develop Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants