[Matmul] Add matmul op #8234

jcf94 · 2021-06-10T07:05:31Z

The nn.dense op and nn.batch_matmul op may have bad performance in some model whose weight is not a const parameters. We have some discussion in https://discuss.tvm.apache.org/t/discussion-about-the-weight-layout-of-dense-batchmatmul/10171

This PR:

Add an extra nn.matmul op that supports data tensor and weight tensor to be transposed or non-transposed format
Make the nn.dense as an alias of nn.matmul when data tensor is non-transposed and weight tensor is transposed
Add an extra option for tensorflow frontend that we can choose to use nn.dense or nn.matmul.

Since currently we only have complete schedule support for nn.dense, the nn.dense approach is still used by default in different frontends.

Will later add full transpose support for nn.batch_matmul in another PR.

cc @comaniac @tqchen @FrozenGene @junrushao1994 @altanh

junrushao · 2021-06-10T15:59:02Z

CC @altanh @tkonolige

tkonolige · 2021-06-10T16:09:06Z

To clarify: you are replacing topi dense implementations with the new matmul one. But you are leaving nn.dense as relay op and adding another relay op called nn.matmul. Why not remove the nn.dense relay op?

comaniac

In general, I would suggest the following:

In TOPI generic, we have generic matmul compute/schedule for all platforms.
In TOPI x86/ARM/CUDA with cblas/cublas enbaled, we use the libraries for all matmuls.
In TOPI x86/ARM/CUDA w/o cblas/cublas, we use the current dense schedule for matmul(False, True), and throw warning for other cases.

For alias, to maintain the compatibility, I agree that we should keep alias for both Relay and TE, but it would be clean if we just simply keep nn.matmul ops and make nn.dense a syntax sugar. In this way, users can still use nn.dense in Relay (equivalent to nn.matmul(False, True)), and use topi.nn.dense in TE (an alias of topi.nn.matmul).

python/tvm/relay/op/nn/nn.py

python/tvm/relay/op/strategy/cuda.py

python/tvm/topi/nn/dense.py

altanh

Thanks for the PR! This is a tough one due to backwards compatibility with nn.dense, I had some nits and questions here and there. After seeing the changes needed, I wonder if it would be better for now to keep matmul and dense separated (but call into the dense compute implementations for NT ops), as changing the DenseAttrs might cause some compatibility problems. But if we want to ultimately eliminate dense, maybe this is the right path. Curious to see other's thoughts on this

altanh · 2021-06-10T16:54:34Z

python/tvm/relay/frontend/tensorflow.py

@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
    params : dict of str to tvm.nd.NDArray
        Dict of converted parameters stored in tvm.nd.NDArray format
    """
+    global _USE_DENSE_INSTEAD_OF_MATMUL


is it possible to avoid using this global variable? I'm not familiar with the importer but would be nice if we could use an importer config dict or something

Yeah, I've also tried several ways, but seems there is no better solution from my view. Python module can be seen as a const singleton, this should be safe if the from_tensorflow function is the only entry.

I feel this is confusing too. If _USE_DENSE_INSTEAD_OF_MATMUL is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.

btw, in this case we can simply _USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op without checking if they are the same or not.

include/tvm/relay/attrs/nn.h

python/tvm/relay/op/nn/_nn.py

python/tvm/relay/op/strategy/cuda.py

altanh · 2021-06-10T17:11:40Z

python/tvm/topi/nn/dense.py

+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"


do we need to keep this as dense or can we unify it to be T_matmul_NT?

@jcf94 please discuss as I have the same issue above.

I think its fine for it is just a op name. 😄 But the tag dense has been used in some schedule check, so I think we'd better keep that.

There're some options I can come up with:

A: Use T_dense as name and dense as tag for NT format, use T_matmul as name and matmul as tag for all other 3 format.

B: Use T_matmul_NN, T_matmul_NT, T_matmul_TN, T_matmul_TT as name for each format, use dense as tag for NT format and matmul as tag for others.

What do you think about?

I personally vote for B.

python/tvm/topi/nn/dense.py

src/relay/op/nn/nn.h

tests/python/relay/test_op_level1.py

jcf94 · 2021-06-11T02:26:11Z

@tkonolige @comaniac @altanh Thanks!

Except for these nit comments about name and coding style, I think currently the problem is still how to keep the existing nn.dense schedules for different devices working while adding a new nn.matmul op.

@comaniac 's solution is exactly what I desired. But in the current op strategy, since all of these existing schedules are registered with nn.dense, we would still need a real nn.dense op.

jcf94 · 2021-06-26T03:22:42Z

Hi @junrushao1994 @tkonolige @comaniac @altanh I've tried several ways to remove the nn.dense, but it results on the modifications among nearly 70 files and some strange CI errors.

Currently I think its still better to keep the nn.dense and nn.matmul. We can gradually remove the use of nn.dense in the future and finally deprecate it.

Also cc @tqchen @FrozenGene

python/tvm/relay/frontend/tensorflow.py

comaniac · 2021-06-28T17:16:48Z

python/tvm/relay/frontend/tensorflow.py

@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
    params : dict of str to tvm.nd.NDArray
        Dict of converted parameters stored in tvm.nd.NDArray format
    """
+    global _USE_DENSE_INSTEAD_OF_MATMUL


I feel this is confusing too. If _USE_DENSE_INSTEAD_OF_MATMUL is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.

btw, in this case we can simply _USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op without checking if they are the same or not.

python/tvm/relay/op/nn/_nn.py

python/tvm/relay/op/nn/nn.py

python/tvm/relay/op/strategy/cuda.py

python/tvm/topi/x86/dense.py

include/tvm/relay/attrs/nn.h

tkonolige · 2021-06-29T00:00:04Z

python/tvm/relay/frontend/tensorflow.py

@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
        return func, self._params


-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):


I don't think we should have a flag here. We should just commit to one codepath.

The problem is that we're not able to remove all the nn.dense at this moment and there's not enough AutoTVM template for nn.matmul.

So the use of nn.matmul can only be seen as a experimental feature. We should not change the default behavior in case this may affect those who are using nn.dense.

Can't we use the dense schedules when A_transpose=false and B_transpose=true. Then we can convert all nn.dense to nn.matmul.

This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.

tkonolige · 2021-06-29T00:02:45Z

python/tvm/relay/op/_tensor_grad.py

+def matmul_grad(orig, grad):
+    """Returns [grad' @ weight, data @ grad']"""
+    data, weight = orig.args
+    if (orig.attrs["data_transposed"], orig.attrs["weight_transposed"]) == (True, True):


Please refactor this to not if/else on every possible combination of transpose.

tkonolige · 2021-06-29T00:05:10Z

python/tvm/relay/op/nn/nn.py

+    units : int, optional
+        Number of hidden units of the matmul transformation.


What is a unit?

I think the doc has explained enough: "The hidden units." This is copied from the original nn.dense.

I don't think this is clear at all. Is the hidden units the inner dimension of the matmul?

tkonolige · 2021-06-29T00:05:56Z

python/tvm/relay/op/nn/nn.py

+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.


Shouldn't both input shapes by dimension 2?

No, the input of matmul is supposed to be a multiple-dim tensor(not limited to 2). This is copied from the original nn.dense.

Other frameworks like Pytorch also has such definition.

Can you update the definition of the computation above to reflect these shapes then?

tkonolige · 2021-06-29T00:09:12Z

python/tvm/relay/op/nn/_nn.py

+    if data_transposed:
+        out[out.shape[0] - 2] = out[out.shape[0] - 1]
+    out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1]


This seems really complicated. Shouldn't it just be some part of data_shape and weight_shape depending on the transposes?

Since the dimension of data tensor can be more than 2, this is the simplest implementation to do so.

python/tvm/topi/nn/dense.py

jcf94 · 2021-06-29T11:31:50Z

Thanks! @comaniac @tkonolige Most comments have been addressed.

comaniac

LGTM. I don't have other comments but just a nit.

include/tvm/relay/attrs/nn.h

comaniac · 2021-06-29T16:51:10Z

python/tvm/relay/frontend/tensorflow.py

@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
        return func, self._params


-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):


This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.

python/tvm/relay/op/strategy/x86.py

jcf94 · 2021-06-30T02:43:27Z

Thanks! @comaniac @tkonolige
I'll add a topi compute support for multi-dim tensor input in another PR.

* Add Matmul Op * Recover DenseAttrs * Add grad for matmul & some update * Update matmul cuda default schedule * Add blas support for matmul * Lint fix add update doc strings

tqchen · 2021-07-03T13:24:03Z

@jcf94 There a few regressions in PyTorch frontend and other places that might be related, can you take a look? dense should map to matmul(A, B.T).

* Add Matmul Op * Recover DenseAttrs * Add grad for matmul & some update * Update matmul cuda default schedule * Add blas support for matmul * Lint fix add update doc strings

jcf94 added 2 commits June 10, 2021 14:29

Add Matmul Op

6d7a450

Bug fix

3c24a86

jcf94 changed the title ~~[Matmul] Add an matmul op for Relay matmul op~~ [Matmul] Add matmul op Jun 10, 2021

jcf94 added 9 commits June 10, 2021 15:18

Lint fix

4a808d7

Update

2f3d4fa

Update

0fc436a

Lint fix

e262e5e

Update

3bdff51

Lint fix

23facfb

Bug fix

e73a5a6

Bug fix

f5aedd8

Bug fix

9dc686f

comaniac requested changes Jun 10, 2021

View reviewed changes

altanh suggested changes Jun 10, 2021

View reviewed changes

Recover DenseAttrs

d5e2625

jcf94 changed the title ~~[Matmul] Add matmul op~~ [WIP][Matmul] Add matmul op Jun 21, 2021

jcf94 force-pushed the relay_matmul_op branch from 93787e4 to 10f0c11 Compare June 22, 2021 04:01

Merge branch 'main' into relay_matmul_op

d9c6e9b

jcf94 force-pushed the relay_matmul_op branch from 10f0c11 to d9c6e9b Compare June 24, 2021 02:53

jcf94 added 2 commits June 24, 2021 11:39

Add grad for matmul & some update

91ead08

Bug fix

eec6f25

jcf94 force-pushed the relay_matmul_op branch from 87f5b43 to eec6f25 Compare June 24, 2021 13:36

jcf94 added 3 commits June 25, 2021 10:41

Update matmul cuda default schedule

85fb6ca

Lint fix

d2d1a32

Update

5300840

jcf94 requested a review from comaniac June 26, 2021 03:23

jcf94 changed the title ~~[WIP][Matmul] Add matmul op~~ [Matmul] Add matmul op Jun 26, 2021

Merge branch 'main' into relay_matmul_op

49ffbd1

jcf94 force-pushed the relay_matmul_op branch from 5957bb3 to 55a39f7 Compare June 28, 2021 11:16

Add blas support for matmul

9a57b20

jcf94 force-pushed the relay_matmul_op branch from 55a39f7 to 9a57b20 Compare June 28, 2021 11:26

comaniac requested changes Jun 28, 2021

View reviewed changes

tkonolige requested changes Jun 29, 2021

View reviewed changes

jcf94 added 5 commits June 29, 2021 11:14

Update

26daf14

Lint fix add update doc strings

32a2f42

Update

e92e783

Update

968c6bf

Update

dc4681e

comaniac approved these changes Jun 29, 2021

View reviewed changes

jcf94 added 2 commits June 30, 2021 10:32

Update

e7a6e76

Lint fix

63426d8

jcf94 added 2 commits June 30, 2021 14:07

Merge branch 'main' into relay_matmul_op

11fd80c

Bug fix for merge main

9b4f77c

jcf94 merged commit 6d1ced0 into apache:main Jun 30, 2021

comaniac mentioned this pull request Jul 7, 2021

[Bug] Fix x86 dense schedule extern ops #8420

Merged

jcf94 mentioned this pull request Jul 22, 2021

[TOPI] Add transpose_a/b & dynamic shape support for batch matmul #8527

Merged

1 task

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Matmul] Add matmul op #8234

[Matmul] Add matmul op #8234

jcf94 commented Jun 10, 2021 •

edited

Loading

junrushao commented Jun 10, 2021

tkonolige commented Jun 10, 2021 •

edited

Loading

comaniac left a comment

altanh left a comment

altanh Jun 10, 2021

jcf94 Jun 11, 2021

comaniac Jun 28, 2021

altanh Jun 10, 2021

comaniac Jun 28, 2021

jcf94 Jun 29, 2021

comaniac Jun 29, 2021

jcf94 commented Jun 11, 2021 •

edited

Loading

jcf94 commented Jun 26, 2021 •

edited

Loading

comaniac Jun 28, 2021

tkonolige Jun 29, 2021

jcf94 Jun 29, 2021

tkonolige Jun 29, 2021

comaniac Jun 29, 2021

tkonolige Jun 29, 2021

tkonolige Jun 29, 2021

jcf94 Jun 29, 2021

tkonolige Jun 29, 2021

tkonolige Jun 29, 2021

jcf94 Jun 29, 2021

tkonolige Jun 29, 2021

tkonolige Jun 29, 2021

jcf94 Jun 29, 2021

jcf94 commented Jun 29, 2021

comaniac left a comment

comaniac Jun 29, 2021

jcf94 commented Jun 30, 2021

tqchen commented Jul 3, 2021 •

edited

Loading

		units : int, optional
		Number of hidden units of the matmul transformation.

[Matmul] Add matmul op #8234

[Matmul] Add matmul op #8234

Conversation

jcf94 commented Jun 10, 2021 • edited Loading

junrushao commented Jun 10, 2021

tkonolige commented Jun 10, 2021 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

altanh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 commented Jun 11, 2021 • edited Loading

jcf94 commented Jun 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 commented Jun 29, 2021

comaniac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 commented Jun 30, 2021

tqchen commented Jul 3, 2021 • edited Loading

jcf94 commented Jun 10, 2021 •

edited

Loading

tkonolige commented Jun 10, 2021 •

edited

Loading

jcf94 commented Jun 11, 2021 •

edited

Loading

jcf94 commented Jun 26, 2021 •

edited

Loading

tqchen commented Jul 3, 2021 •

edited

Loading