[ONNX] Add MatMulInteger importer #10450

anwang2009 · 2022-03-02T16:45:54Z

As title. Re-uses QLinearMatMul's implementation. The only difference is that MatMulInteger accumulates into int32, not int8 or uint8.

cc @cconvey @AndrewZhaoLuo

AndrewZhaoLuo · 2022-03-02T18:34:37Z

python/tvm/relay/frontend/onnx.py

+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+        a = inputs[0]


Hmm so pretty sure you can just do relay.nn.matmul(..., out_dtype='int32'). You don't need all of this QLinearMatMul stuff to handle accumulation without overflow imo.

If scale is always 1 and zero point is always 0, I think that's true.

My impression was that MatMulInteger exactly corresponds to our qnn.dense. We can use QLinearMatMul converter but we want to skip requantize.

https://github.com/onnx/onnx/blob/main/docs/Operators.md#matmulinteger <-- I was fixated on the line where it said it was identical to numpy's. It appears the zero points are not fixed always so using QLinearMatMul is probably best choice.

modified QLinearMatMul converter to skip requantize in the MatMulInteger case

AndrewZhaoLuo · 2022-03-03T03:20:10Z

python/tvm/relay/frontend/onnx.py

+        b_scale = _op.const(1.0, dtype="float32")
+        out_scale = _op.const(1.0, dtype="float32")
+
+        a_zero_point = _op.const(0.0, dtype=a_dtype)


zero point does not seem to be fixed to 0? The scale points appear to be not present though which is weird. You might have to read the codebase and see what the intention really is since the documentation is confusing (and my guess has a mistake in it): https://github.com/onnx/onnx/blob/main/docs/Operators.md#matmulinteger

yeah zp isn't necessarily fixed to 0, but the docs say 0 should be the default. I do this a few lines down:

if len(inputs) == 4: a_zero_point = inputs[2] b_zero_point = inputs[3]

From the example it looks like the scale factors are 1 by default.

anwang2009 · 2022-03-03T22:38:51Z

It looks like the onnx importer test is failing on test_matmulinteger for cuda, since the cuda schedule generation for dense_int8 relies on the inner dimension of the multiply being a multiple of 4. The default onnx test's inner dimension is 3, and there is no generic schedule for non-4-multiples to fall back to. Should I close this PR until this issue can be addressed?

masahi · 2022-03-03T22:48:06Z

I think there should be dense legalize or something that adds padding and slicing to handle such cases

anwang2009 · 2022-04-12T18:01:20Z

python/tvm/topi/cuda/tensorcore_alter_op.py

        ):
            # no need to pad
            return None

-        candidates = [(16, 16, 16), (32, 16, 8), (8, 16, 32)]
+        candidates = [(16, 16, 16), (32, 16, 8), (8, 16, 32), (4, 4, 4)]


PTAL @AndrewZhaoLuo @masahi

In particular this allows tighter padding boxes in order to enable onnx cuda tests where the shapes are on the order of (2, 3) x (3, 4), because padding fails if the padding boxes are not densely populated enough with real data. I imagine one potential downside is that larger tensors might be computed faster in broad sweeps with 16x16x16 padding vs the finer grained 4x4x4, but I have no concrete evidence either way.

wdyt, are we ok with adding the 4x4x4 padding target?

Can you try decoupling (4, 4, 4) padding from tensorcore stuff in this file? The shape (4, 4, 4) is invalid for tensorcore.

For int8, I think rather than hard-coding (4, 4, 4) (which doesn't work for other shape like 13), I think the right solution is to "pad to the nearest multiple of 4 greater than the given dim".

anwang2009 · 2022-04-12T23:09:20Z

python/tvm/topi/cuda/tensorcore_alter_op.py

+        extra_flops_ratio = _extra_flops(M, K, N, dm, dk, dn) / (M * K * N)
+
+        if extra_flops_ratio > 2:
+            return None


@masahi I moved the 4x4x4 treatment to this block. Is this what you meant or did you want the 4x4x4 logic outside of tensorcore_alter_op.py entirely?

Yes I think this is good. I don't like this file being named tensorcore_alter_op.py but that's a different story.

But I think this code block is only useful for int8. So probably we need if dtype in ["int8", "uint8"] check? Also I'm not clear what happens if padding is necessary but extra_flops_ratio > 2 says True.

And please do not nest if extra_flops_ratio > 2: check, since it's a bit weird.

Sounds good.

So I suggest something like this:

do_pad_for_int8 = ... if extra_flops_ratio > 2 and not do_pad_for_int8: logger.info(...) return None

Refactored the return None to be ordered after all the logic for dense slightly differently, PTAL

masahi · 2022-04-12T23:50:46Z

python/tvm/topi/cuda/tensorcore_alter_op.py

-        # The shape of (M, K, N) must be multiple of (16, 16, 16) or (32, 16, 8) or (8, 16, 32)
+        # The shape of (M, K, N) must be multiple of
+        # (16, 16, 16) or (32, 16, 8) or (8, 16, 32)
+        # from https://arxiv.org/pdf/1811.09736.pdf


Can you remove this link? This is a well-known HW fact.

@yzh119

* main: (527 commits) [hexagon] 'add_hvx' test to explore HVX usage. (apache#10604) [COMMUNITY] @yzh119 -> Reviewer (apache#10993) [Metaschedule] Make custom schedule_rule registration optional (apache#10975) [ONNX] Add imports for BERT contrib operators (apache#10949) sort axes (apache#10985) [Hexagon] Remove HexagonBuffer external constructor and support (apache#10978) [CI] Update GPU image (apache#10992) [Runtime][Vulkan] Add RGP support to TVM for vulkan device (apache#10953) [FIX] resolve int64/32 for AttrStmtNode (apache#10983) [TVMC] Allow output module name to be passed as a command line argument (apache#10962) [ONNX] Add MatMulInteger importer (apache#10450) [COMMUNITY] @guberti -> Reviewer (apache#10976) Support `qnn.conv2d` in FoldExplicitPading (apache#10982) change Hexagon docker version (apache#10981) remove exception handling of autotvm xgboost extract functions (apache#10948) [CUDNN] Add partitioning support for conv2d and log_softmax (apache#10961) [Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout (apache#10905) [Hexagon] Move aot/graph_executor interactions into launcher (apache#10907) [HEXAGON] Split huge 1D DMA Transfers into smaller transfers with legal sizes. (apache#10971) [CI][DOCKER] Add pytest-lazy-fixture to images (apache#10970) ...

* implement matmulinteger * rm test * rm outdated comments * fix lint and review * wip * fixes * fix * alter tests * extra 4x4x4 step * comments

anwang2009 requested review from jwfromm, mbrookhart, srkreddy1238, siju-samuel, Huyuwei, kazum, comaniac, junrushao, tqchen, jroesch, areusch, yzhliu, merrymercy and icemelon as code owners March 2, 2022 16:45

github-actions bot requested a review from AndrewZhaoLuo March 2, 2022 16:46

AndrewZhaoLuo reviewed Mar 2, 2022

View reviewed changes

AndrewZhaoLuo reviewed Mar 3, 2022

View reviewed changes

github-actions bot requested a review from AndrewZhaoLuo March 3, 2022 08:30

anwang2009 force-pushed the aw/matmulinteger branch from b073ea1 to 556cb4e Compare March 28, 2022 22:16

anwang2009 force-pushed the aw/matmulinteger branch from 4097eb7 to ec940d4 Compare April 6, 2022 00:42

areusch removed their request for review April 8, 2022 22:58

anwang2009 commented Apr 12, 2022

View reviewed changes

anwang2009 added 6 commits April 12, 2022 13:59

implement matmulinteger

eb917d8

rm test

8cde209

rm outdated comments

696399f

fix lint and review

7f4588d

wip

f80198c

fixes

132c314

anwang2009 added 3 commits April 12, 2022 13:59

fix

36c2690

alter tests

a3ba9ad

extra 4x4x4 step

749cc16

anwang2009 force-pushed the aw/matmulinteger branch from ec940d4 to 749cc16 Compare April 12, 2022 23:08

anwang2009 commented Apr 12, 2022

View reviewed changes

masahi reviewed Apr 12, 2022

View reviewed changes

comments

bbdf02c

masahi approved these changes Apr 13, 2022

View reviewed changes

masahi merged commit 597000c into apache:main Apr 13, 2022

Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022

[ONNX] Add MatMulInteger importer (apache#10450)

209b4c1

* implement matmulinteger * rm test * rm outdated comments * fix lint and review * wip * fixes * fix * alter tests * extra 4x4x4 step * comments

altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022

[ONNX] Add MatMulInteger importer (apache#10450)

0aeba5f

* implement matmulinteger * rm test * rm outdated comments * fix lint and review * wip * fixes * fix * alter tests * extra 4x4x4 step * comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Add MatMulInteger importer #10450

[ONNX] Add MatMulInteger importer #10450

anwang2009 commented Mar 2, 2022

AndrewZhaoLuo Mar 2, 2022

mbrookhart Mar 2, 2022

masahi Mar 2, 2022

AndrewZhaoLuo Mar 3, 2022

anwang2009 Mar 3, 2022

AndrewZhaoLuo Mar 3, 2022

anwang2009 Mar 3, 2022

anwang2009 commented Mar 3, 2022

masahi commented Mar 3, 2022

anwang2009 Apr 12, 2022

masahi Apr 12, 2022

anwang2009 Apr 12, 2022

masahi Apr 12, 2022

masahi Apr 12, 2022

anwang2009 Apr 12, 2022

masahi Apr 12, 2022

anwang2009 Apr 13, 2022

masahi Apr 12, 2022

[ONNX] Add MatMulInteger importer #10450

[ONNX] Add MatMulInteger importer #10450

Conversation

anwang2009 commented Mar 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anwang2009 commented Mar 3, 2022

masahi commented Mar 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment