Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inductor] [Doc] Add debugging document for inductor cpu backend #2430

Merged
merged 62 commits into from
Jun 30, 2023

Conversation

Valentine233
Copy link
Contributor

@Valentine233 Valentine233 commented Jun 6, 2023

Fixes #2348

Description

The doc is intended to introduce the usage, debugging and performance profiling for torch.compile with Inductor CPU backend.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @williamwen42 @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe

@Valentine233 Valentine233 force-pushed the add_inductor_debug_doc branch from 2625d8b to 62e4c44 Compare June 6, 2023 02:40
@github-actions github-actions bot added docathon-h1-2023 A label for the docathon in H1 2023 medium intel and removed cla signed labels Jun 6, 2023
@netlify
Copy link

netlify bot commented Jun 6, 2023

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit 2625d8b
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/647e9bd49c73390008b2c362
😎 Deploy Preview https://deploy-preview-2430--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@netlify
Copy link

netlify bot commented Jun 6, 2023

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit 62e4c44
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/647e9cb0d102da0008fc043d
😎 Deploy Preview https://deploy-preview-2430--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@Valentine233 Valentine233 marked this pull request as draft June 6, 2023 02:45
@Valentine233
Copy link
Contributor Author

@jgong5 @EikanWang Hi, please review for the draft document of inductor debugging. The part of profiling would be added soon. cc @zhuhaozhe

@netlify
Copy link

netlify bot commented Jun 6, 2023

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit d8ab7e5
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/649f32b1698fcd00083debc0
😎 Deploy Preview https://deploy-preview-2430--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@svekars svekars added the torch.compile Torch compile and other relevant tutorials label Jun 6, 2023
@svekars svekars requested review from williamwen42 and msaroufim June 6, 2023 15:06
@github-actions github-actions bot removed cla signed torch.compile Torch compile and other relevant tutorials labels Jun 6, 2023
@williamwen42
Copy link
Member

Can you write the tutorial as a .py file instead of a .rst file?

@svekars
Copy link
Contributor

svekars commented Jun 6, 2023

@Valentine233 thanks so much for this document. If you'd like to write this as .py, you can use this template.

@Valentine233 Valentine233 requested a review from malfet June 19, 2023 01:33
intermediate_source/inductor_debug_cpu.py Outdated Show resolved Hide resolved
intermediate_source/inductor_debug_cpu.py Outdated Show resolved Hide resolved
with torch.no_grad():
compiled_model(**input_dict)

NUM_ITERS=100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Valentine233 can we reduce this to a reasonable smaller number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @svekars. Thanks for review, I will reduce it.

inductor_func(*input)

import timeit
NUM_ITERS=1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Valentine233 Can we reduce this to a reasonable smaller number?

Comment on lines +593 to +598
# .. code-block:: shell
#
# eager use: 5.780875144992024 ms/iter
# inductor use: 0.9588955780491233 ms/iter
# speed up ratio: 6.0286805751604735
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be generated in the final [HTML]( I see duplicate outputs in the resulting HTML: https://docs-preview.pytorch.org/pytorch/tutorials/2430/intermediate/inductor_debug_cpu.html#performance-profiling ). Can we remove this block or state in which env you get this numbers if it's important to show specifically these results?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @svekars. Will this automatically generate results running from time to time and get different outputs? Or will it be frozen after PR is merged?

Comment on lines 440 to 470
# ------------------------- ------------ ------------ ------------ ------------
# Name CPU total % CPU total CPU time avg # of Calls
# ------------------------- ------------ ------------ ------------ ------------
# aten::addmm 45.73% 370.814ms 1.024ms 362
# aten::add 19.89% 161.276ms 444.287us 363
# aten::copy_ 14.97% 121.416ms 248.803us 488
# aten::mul 9.02% 73.154ms 377.082us 194
# aten::clamp_min 8.81% 71.444ms 744.208us 96
# aten::bmm 5.46% 44.258ms 922.042us 48
# ProfilerStep* 100.00% 810.920ms 810.920ms 1
# aten::div 2.89% 23.447ms 976.958us 24
# aten::_softmax 1.00% 8.087ms 336.958us 24
# aten::linear 46.48% 376.888ms 1.041ms 362
# aten::clone 2.77% 22.430ms 228.878us 98
# aten::t 0.31% 2.502ms 6.912us 362
# aten::view 0.14% 1.161ms 1.366us 850
# aten::transpose 0.17% 1.377ms 3.567us 386
# aten::index_select 0.12% 952.000us 317.333us 3
# aten::expand 0.12% 986.000us 2.153us 458
# aten::matmul 8.31% 67.420ms 1.405ms 48
# aten::cat 0.09% 703.000us 703.000us 1
# aten::as_strided 0.08% 656.000us 0.681us 963
# aten::relu 8.86% 71.864ms 748.583us 96
# ------------------------- ------------ ------------ ------------ ------------
# Self CPU time total: 810.920ms
#

######################################################################
#
# Similarly, we also get the table for the compiled model with Inductor (omitting some columns):
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be generated in the final [HTML]( I see duplicate outputs in the resulting HTML: https://docs-preview.pytorch.org/pytorch/tutorials/2430/intermediate/inductor_debug_cpu.html#performance-profiling ). Can we remove this block or state in which env you get this numbers if it's important to show specifically these results?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @svekars. The generated table is not in good format ( It's too long for 1 line).
For all output related blocks, can we choose the comment out print and use our results ( with env setting and system types)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it is fine. Or we can keep both and say something like this before the not generated output: The following output was generated on .... hardware.

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Any additional changes can be merged separately. Thanks for contributing!

# numactl -C 0-31 -m 0 python bench.py
#

# bench.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# bench.py
# Code for ``bench.py``:

@svekars svekars merged commit d06d2f1 into pytorch:main Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced cla signed docathon-h1-2023 A label for the docathon in H1 2023 intel
Projects
None yet
9 participants