[TIR] Utility function to decide loop mapping for auto tensorization #11050

masahi · 2022-04-18T22:02:48Z

Add TensorizeInfo structure and GetTensorizeLoopMapping function, that are used for determining the correspondence of loops between a target block and an intrinsic description.

Matching is based on a heuristic: It works in all cases I tested (CPU dot product for dense / conv2d, CPU / GPU matmul), but there is no guarantee that it always finds the "right" mapping. If the mapping is not correct, tensorize would fail.

The original code is https://github.com/spectrometerHBH/tvm/blob/auto-tensorization/src/tir/schedule/analysis/analysis.cc#L1175, I modified this code to support more cases and added tests. I'm sending this PR on behalf of the team, but most of the work were done by others earlier.

Co-authored-by: Siyuan Feng Hzfengsy@sjtu.edu.cn
Co-authored-by: Bohan Hou 32121147+spectrometerHBH@users.noreply.github.com
Co-authored-by: Hongyi Jin 3231950289@qq.com
Co-authored-by: Ruihang Lai lairuihangdongdong@qq.com
Co-authored-by: Wuwei Lin wuwei@apache.org

@vinx13 @junrushao1994 @spectrometerHBH @Hzfengsy @MasterJH5574 @jinhongyii

masahi · 2022-04-19T00:27:46Z

src/tir/schedule/analysis/analysis.cc

+      next_block_ind = i_block - 1;
+      break;
+    }
+  }


The logic here is very different from the one in the original code https://github.com/spectrometerHBH/tvm/blob/auto-tensorization/src/tir/schedule/analysis/analysis.cc#L1246. I was not able to understand why the original code has been written that way and it didn't work for the case where matching loops in the target block are not in the innermost positions (conv2d NCHWc on CPU, a test in

tvm/tests/python/unittest/test_tir_schedule_analysis.py

Line 199 in d6ae848

def test_get_tensorize_loop_mapping_conv2d_nchwc_vnni():

).

I think my change is simple and obvious. The condition for a match is (1) divisibility of loop extent and (2) matching iterator types (reduction vs spatial). Mapping is determined starting from the innermost axis.

Please have a look at this change carefully, and let me know if I need to bring back some logic in the original code @spectrometerHBH @vinx13

I would love to have @spectrometerHBH review this change before merging

The goal of the original mapping is to support

for k: for i: for j: C[i, j] += A[i, k] * B[k, j]

where loops are not in the same order as the tensor intrinsic description function.

But it also makes sense if we don't support such cases for this PR. So I approve it.

Thanks @spectrometerHBH, I now understand the original code and was able to integrate the original logic to support loop permutations. Please have a look at the current diff, also cc @vinx13 @Hzfengsy @MasterJH5574

The key difference between the original code and the code I submitted yesterday was that, my code was looking at only the loop nest (ForNode) to determine the mapping, while @spectrometerHBH's mapping logic is based on iter_var/value of the block (so invariant to the order of the loop nest).

MasterJH5574

Thanks Masa! I just caught a minor point.

python/tvm/tir/schedule/analysis.py

This reverts commit eb147f3.

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

masahi · 2022-04-19T21:45:39Z

src/tir/schedule/analysis/analysis.cc

+  ICHECK(desc_loops.size() == static_cast<size_t>(n_desc_vars));
+  ICHECK(block_loops.size() == iter_types_block.size());
+
+  // We assume that the orders of iter_vars in the target and the desc block are consistent.


i.e., no matter what the permutation of loop is, we should always have

i, j, k = T.axis.remap("SSR", [i0, i1, i2])

for GEMM.

I think this is a reasonable assumption. Correct me if I'm wrong @spectrometerHBH @junrushao1994 @vinx13

I agree this is a reasonable assumption. Though there might be corner cases, it covers all of the current use cases

masahi commented Apr 19, 2022

View reviewed changes

masahi force-pushed the tir-tensorize-loop-mapping branch from d6ae848 to 1ff1df9 Compare April 19, 2022 00:37

Hzfengsy approved these changes Apr 19, 2022

View reviewed changes

MasterJH5574 reviewed Apr 19, 2022

View reviewed changes

python/tvm/tir/schedule/analysis.py Outdated Show resolved Hide resolved

spectrometerHBH approved these changes Apr 19, 2022

View reviewed changes

masahi and others added 15 commits April 20, 2022 04:45

[TIR] Add TensorizeInfo and GetTensorizeLoopMapping

062d8c2

expose PreOrderVisit to python

e0c3337

add test case

51df94d

add conv2d nchwc test

84801b6

add mma test

fcd7917

add arm nhwc conv2d test

65682c2

Revert "add arm nhwc conv2d test"

fcca9fb

This reverts commit eb147f3.

refine

4eb5845

add doc

f759f43

update

f0caa77

fixd condition

0df73cb

black

46eed2a

pylint

ecb3ebc

Update python/tvm/tir/schedule/analysis.py

0860abc

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

run black

ec39b62

masahi force-pushed the tir-tensorize-loop-mapping branch from e504536 to 94391b1 Compare April 19, 2022 20:59

bring back logic in original code to support loop permutation

9ec0974

masahi force-pushed the tir-tensorize-loop-mapping branch from 94391b1 to 9ec0974 Compare April 19, 2022 21:04

add comment

2909a06

masahi force-pushed the tir-tensorize-loop-mapping branch from 212d5dc to 2909a06 Compare April 19, 2022 21:39

masahi commented Apr 19, 2022

View reviewed changes

masahi added 2 commits April 20, 2022 07:16

simplify

f474003

minor fix to test

8750b4d

vinx13 approved these changes Apr 20, 2022

View reviewed changes

vinx13 merged commit 3823b39 into apache:main Apr 20, 2022

masahi mentioned this pull request Apr 20, 2022

[TIR] Add function to tile a block according to a given tensor intrinsic #11075

Merged

vinx13 mentioned this pull request Apr 20, 2022

[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

Closed

62 tasks

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Utility function to decide loop mapping for auto tensorization #11050

[TIR] Utility function to decide loop mapping for auto tensorization #11050

masahi commented Apr 18, 2022 •

edited

Loading

masahi Apr 19, 2022

junrushao Apr 19, 2022

spectrometerHBH Apr 19, 2022

spectrometerHBH Apr 19, 2022 •

edited

Loading

masahi Apr 19, 2022

MasterJH5574 left a comment

masahi Apr 19, 2022 •

edited

Loading

vinx13 Apr 20, 2022

[TIR] Utility function to decide loop mapping for auto tensorization #11050

[TIR] Utility function to decide loop mapping for auto tensorization #11050

Conversation

masahi commented Apr 18, 2022 • edited Loading

masahi Apr 19, 2022

Choose a reason for hiding this comment

junrushao Apr 19, 2022

Choose a reason for hiding this comment

spectrometerHBH Apr 19, 2022

Choose a reason for hiding this comment

spectrometerHBH Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

masahi Apr 19, 2022

Choose a reason for hiding this comment

MasterJH5574 left a comment

Choose a reason for hiding this comment

masahi Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

vinx13 Apr 20, 2022

Choose a reason for hiding this comment

masahi commented Apr 18, 2022 •

edited

Loading

spectrometerHBH Apr 19, 2022 •

edited

Loading

masahi Apr 19, 2022 •

edited

Loading