Skip to content

Commit

Permalink
use original table for profile (omitting some columns) (#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhuhaozhe authored Jun 16, 2023
1 parent b9e4e1c commit eb4abc9
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 4 deletions.
Binary file removed _static/img/eager_prof.png
Binary file not shown.
Binary file removed _static/img/inductor_prof.png
Binary file not shown.
61 changes: 57 additions & 4 deletions intermediate_source/inductor_debug_cpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -432,13 +432,66 @@ def trace_handler(p):
p.step()

######################################################################
# We get the following performance profiling table for the eager-mode model:
# We get the following performance profiling table for the eager-mode model (omitting some columns):
#
# .. image:: ../_static/img/eager_prof.png
# .. code-block:: shell
#
# ------------------------- ------------ ------------ ------------ ------------
# Name CPU total % CPU total CPU time avg # of Calls
# ------------------------- ------------ ------------ ------------ ------------
# aten::addmm 45.73% 370.814ms 1.024ms 362
# aten::add 19.89% 161.276ms 444.287us 363
# aten::copy_ 14.97% 121.416ms 248.803us 488
# aten::mul 9.02% 73.154ms 377.082us 194
# aten::clamp_min 8.81% 71.444ms 744.208us 96
# aten::bmm 5.46% 44.258ms 922.042us 48
# ProfilerStep* 100.00% 810.920ms 810.920ms 1
# aten::div 2.89% 23.447ms 976.958us 24
# aten::_softmax 1.00% 8.087ms 336.958us 24
# aten::linear 46.48% 376.888ms 1.041ms 362
# aten::clone 2.77% 22.430ms 228.878us 98
# aten::t 0.31% 2.502ms 6.912us 362
# aten::view 0.14% 1.161ms 1.366us 850
# aten::transpose 0.17% 1.377ms 3.567us 386
# aten::index_select 0.12% 952.000us 317.333us 3
# aten::expand 0.12% 986.000us 2.153us 458
# aten::matmul 8.31% 67.420ms 1.405ms 48
# aten::cat 0.09% 703.000us 703.000us 1
# aten::as_strided 0.08% 656.000us 0.681us 963
# aten::relu 8.86% 71.864ms 748.583us 96
# ------------------------- ------------ ------------ ------------ ------------
# Self CPU time total: 810.920ms

#
# Similarly, we also get the table for the compiled model with Inductor (omitting some columns):
#
# Similarly, we also get the table for the compiled model with Inductor:
# .. code-block:: shell
#
# .. image:: ../_static/img/inductor_prof.png
# ----------------------------------------------- ------------ ------------ ------------
# Name CPU total % CPU total # of Calls
# ----------------------------------------------- ------------ ------------ ------------
# mkl::_mkl_linear 68.79% 231.573ms 362
# aten::bmm 8.02% 26.992ms 48
# ProfilerStep* 100.00% 336.642ms 1
# graph_0_cpp_fused_constant_pad_nd_embedding_0 0.27% 915.000us 1
# aten::empty 0.27% 911.000us 362
# graph_0_cpp_fused__mkl_linear_add_mul_relu_151 0.27% 901.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_226 0.27% 899.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_361 0.27% 898.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_121 0.27% 895.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_31 0.27% 893.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_76 0.26% 892.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_256 0.26% 892.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_346 0.26% 892.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_241 0.26% 891.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_316 0.26% 891.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_91 0.26% 890.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_106 0.26% 890.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_211 0.26% 890.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_61 0.26% 889.000us 1
# graph_0_cpp_fused__mkl_linear_add_mul_relu_286 0.26% 889.000us 1
# ----------------------------------------------- ------------ ------------ ------------
# Self CPU time total: 336.642ms
#
# From the profiling table of the eager model, we can see the most time consumption ops are [``aten::addmm``, ``aten::add``, ``aten::copy_``, ``aten::mul``, ``aten::clamp_min``, ``aten::bmm``].
# Comparing with the inductor model profiling table, we notice an ``mkl::_mkl_linear`` entry and multiple fused kernels in the form ``graph_0_cpp_fused_*``. They are the major
Expand Down

0 comments on commit eb4abc9

Please sign in to comment.