[Inductor] [Doc] Add debugging document for inductor cpu backend #2430

Valentine233 · 2023-06-06T02:37:06Z

Description

The doc is intended to introduce the usage, debugging and performance profiling for torch.compile with Inductor CPU backend.

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

cc @williamwen42 @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe

netlify · 2023-06-06T02:41:25Z

✅ Deploy Preview for pytorch-tutorials-preview ready!

Name	Link
🔨 Latest commit	`2625d8b`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/647e9bd49c73390008b2c362
😎 Deploy Preview	https://deploy-preview-2430--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

netlify · 2023-06-06T02:45:34Z

✅ Deploy Preview for pytorch-tutorials-preview ready!

Name	Link
🔨 Latest commit	`62e4c44`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/647e9cb0d102da0008fc043d
😎 Deploy Preview	https://deploy-preview-2430--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Valentine233 · 2023-06-06T02:50:06Z

@jgong5 @EikanWang Hi, please review for the draft document of inductor debugging. The part of profiling would be added soon. cc @zhuhaozhe

netlify · 2023-06-06T02:50:12Z

✅ Deploy Preview for pytorch-tutorials-preview ready!

Name	Link
🔨 Latest commit	`d8ab7e5`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/649f32b1698fcd00083debc0
😎 Deploy Preview	https://deploy-preview-2430--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

williamwen42 · 2023-06-06T21:46:57Z

Can you write the tutorial as a .py file instead of a .rst file?

svekars · 2023-06-06T21:48:41Z

@Valentine233 thanks so much for this document. If you'd like to write this as .py, you can use this template.

intermediate_source/inductor_debug_cpu.py

svekars · 2023-06-29T20:27:36Z

intermediate_source/inductor_debug_cpu.py

+with torch.no_grad():
+    compiled_model(**input_dict)
+
+NUM_ITERS=100


@Valentine233 can we reduce this to a reasonable smaller number?

Hi, @svekars. Thanks for review, I will reduce it.

svekars · 2023-06-29T20:28:05Z

intermediate_source/inductor_debug_cpu.py

+    inductor_func(*input)
+
+import timeit
+NUM_ITERS=1000


@Valentine233 Can we reduce this to a reasonable smaller number?

svekars · 2023-06-29T20:30:00Z

intermediate_source/inductor_debug_cpu.py

+# .. code-block:: shell
+#
+#     eager use: 5.780875144992024 ms/iter
+#     inductor use: 0.9588955780491233 ms/iter
+#     speed up ratio: 6.0286805751604735
+#


This will be generated in the final [HTML]( I see duplicate outputs in the resulting HTML: https://docs-preview.pytorch.org/pytorch/tutorials/2430/intermediate/inductor_debug_cpu.html#performance-profiling ). Can we remove this block or state in which env you get this numbers if it's important to show specifically these results?

Hi, @svekars. Will this automatically generate results running from time to time and get different outputs? Or will it be frozen after PR is merged?

svekars · 2023-06-29T20:30:39Z

intermediate_source/inductor_debug_cpu.py

+#     -------------------------  ------------  ------------  ------------  ------------  
+#                          Name   CPU total %     CPU total  CPU time avg    # of Calls  
+#     -------------------------  ------------  ------------  ------------  ------------  
+#                   aten::addmm        45.73%     370.814ms       1.024ms           362  
+#                     aten::add        19.89%     161.276ms     444.287us           363  
+#                   aten::copy_        14.97%     121.416ms     248.803us           488  
+#                     aten::mul         9.02%      73.154ms     377.082us           194  
+#               aten::clamp_min         8.81%      71.444ms     744.208us            96  
+#                     aten::bmm         5.46%      44.258ms     922.042us            48  
+#                 ProfilerStep*       100.00%     810.920ms     810.920ms             1  
+#                     aten::div         2.89%      23.447ms     976.958us            24  
+#                aten::_softmax         1.00%       8.087ms     336.958us            24  
+#                  aten::linear        46.48%     376.888ms       1.041ms           362  
+#                   aten::clone         2.77%      22.430ms     228.878us            98  
+#                       aten::t         0.31%       2.502ms       6.912us           362  
+#                    aten::view         0.14%       1.161ms       1.366us           850  
+#               aten::transpose         0.17%       1.377ms       3.567us           386  
+#            aten::index_select         0.12%     952.000us     317.333us             3  
+#                  aten::expand         0.12%     986.000us       2.153us           458  
+#                  aten::matmul         8.31%      67.420ms       1.405ms            48  
+#                     aten::cat         0.09%     703.000us     703.000us             1  
+#              aten::as_strided         0.08%     656.000us       0.681us           963  
+#                    aten::relu         8.86%      71.864ms     748.583us            96  
+#     -------------------------  ------------  ------------  ------------  ------------  
+#     Self CPU time total: 810.920ms
+#
+
+######################################################################
+#
+# Similarly, we also get the table for the compiled model with Inductor (omitting some columns):
+#


This will be generated in the final [HTML]( I see duplicate outputs in the resulting HTML: https://docs-preview.pytorch.org/pytorch/tutorials/2430/intermediate/inductor_debug_cpu.html#performance-profiling ). Can we remove this block or state in which env you get this numbers if it's important to show specifically these results?

Hi, @svekars. The generated table is not in good format ( It's too long for 1 line).
For all output related blocks, can we choose the comment out print and use our results ( with env setting and system types)?

yes, I think it is fine. Or we can keep both and say something like this before the not generated output: The following output was generated on .... hardware.

…233/tutorials into add_inductor_debug_doc

intermediate_source/inductor_debug_cpu.py

svekars

Looks good! Any additional changes can be merged separately. Thanks for contributing!

svekars · 2023-06-30T19:50:23Z

intermediate_source/inductor_debug_cpu.py

+#     numactl -C 0-31 -m 0 python bench.py
+#
+
+# bench.py


Suggested change

# bench.py

# Code for ``bench.py``: