-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inductor] [Doc] Add debugging document for inductor cpu backend #2430
Conversation
2625d8b
to
62e4c44
Compare
✅ Deploy Preview for pytorch-tutorials-preview ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
62e4c44
to
87e977e
Compare
✅ Deploy Preview for pytorch-tutorials-preview ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
@jgong5 @EikanWang Hi, please review for the draft document of inductor debugging. The part of profiling would be added soon. cc @zhuhaozhe |
✅ Deploy Preview for pytorch-tutorials-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Can you write the tutorial as a |
@Valentine233 thanks so much for this document. If you'd like to write this as .py, you can use this template. |
with torch.no_grad(): | ||
compiled_model(**input_dict) | ||
|
||
NUM_ITERS=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Valentine233 can we reduce this to a reasonable smaller number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @svekars. Thanks for review, I will reduce it.
inductor_func(*input) | ||
|
||
import timeit | ||
NUM_ITERS=1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Valentine233 Can we reduce this to a reasonable smaller number?
# .. code-block:: shell | ||
# | ||
# eager use: 5.780875144992024 ms/iter | ||
# inductor use: 0.9588955780491233 ms/iter | ||
# speed up ratio: 6.0286805751604735 | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be generated in the final [HTML]( I see duplicate outputs in the resulting HTML: https://docs-preview.pytorch.org/pytorch/tutorials/2430/intermediate/inductor_debug_cpu.html#performance-profiling ). Can we remove this block or state in which env you get this numbers if it's important to show specifically these results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @svekars. Will this automatically generate results running from time to time and get different outputs? Or will it be frozen after PR is merged?
# ------------------------- ------------ ------------ ------------ ------------ | ||
# Name CPU total % CPU total CPU time avg # of Calls | ||
# ------------------------- ------------ ------------ ------------ ------------ | ||
# aten::addmm 45.73% 370.814ms 1.024ms 362 | ||
# aten::add 19.89% 161.276ms 444.287us 363 | ||
# aten::copy_ 14.97% 121.416ms 248.803us 488 | ||
# aten::mul 9.02% 73.154ms 377.082us 194 | ||
# aten::clamp_min 8.81% 71.444ms 744.208us 96 | ||
# aten::bmm 5.46% 44.258ms 922.042us 48 | ||
# ProfilerStep* 100.00% 810.920ms 810.920ms 1 | ||
# aten::div 2.89% 23.447ms 976.958us 24 | ||
# aten::_softmax 1.00% 8.087ms 336.958us 24 | ||
# aten::linear 46.48% 376.888ms 1.041ms 362 | ||
# aten::clone 2.77% 22.430ms 228.878us 98 | ||
# aten::t 0.31% 2.502ms 6.912us 362 | ||
# aten::view 0.14% 1.161ms 1.366us 850 | ||
# aten::transpose 0.17% 1.377ms 3.567us 386 | ||
# aten::index_select 0.12% 952.000us 317.333us 3 | ||
# aten::expand 0.12% 986.000us 2.153us 458 | ||
# aten::matmul 8.31% 67.420ms 1.405ms 48 | ||
# aten::cat 0.09% 703.000us 703.000us 1 | ||
# aten::as_strided 0.08% 656.000us 0.681us 963 | ||
# aten::relu 8.86% 71.864ms 748.583us 96 | ||
# ------------------------- ------------ ------------ ------------ ------------ | ||
# Self CPU time total: 810.920ms | ||
# | ||
|
||
###################################################################### | ||
# | ||
# Similarly, we also get the table for the compiled model with Inductor (omitting some columns): | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be generated in the final [HTML]( I see duplicate outputs in the resulting HTML: https://docs-preview.pytorch.org/pytorch/tutorials/2430/intermediate/inductor_debug_cpu.html#performance-profiling ). Can we remove this block or state in which env you get this numbers if it's important to show specifically these results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @svekars. The generated table is not in good format ( It's too long for 1 line).
For all output
related blocks, can we choose the comment out print
and use our results ( with env setting and system types)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think it is fine. Or we can keep both and say something like this before the not generated output: The following output was generated on .... hardware.
…233/tutorials into add_inductor_debug_doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Any additional changes can be merged separately. Thanks for contributing!
# numactl -C 0-31 -m 0 python bench.py | ||
# | ||
|
||
# bench.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# bench.py | |
# Code for ``bench.py``: |
Fixes #2348
Description
The doc is intended to introduce the usage, debugging and performance profiling for
torch.compile
with Inductor CPU backend.Checklist
cc @williamwen42 @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe