-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Matmul] Add matmul op #8234
[Matmul] Add matmul op #8234
Conversation
To clarify: you are replacing topi dense implementations with the new matmul one. But you are leaving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I would suggest the following:
- In TOPI generic, we have generic matmul compute/schedule for all platforms.
- In TOPI x86/ARM/CUDA with cblas/cublas enbaled, we use the libraries for all matmuls.
- In TOPI x86/ARM/CUDA w/o cblas/cublas, we use the current dense schedule for matmul(False, True), and throw warning for other cases.
For alias, to maintain the compatibility, I agree that we should keep alias for both Relay and TE, but it would be clean if we just simply keep nn.matmul
ops and make nn.dense
a syntax sugar. In this way, users can still use nn.dense
in Relay (equivalent to nn.matmul(False, True)
), and use topi.nn.dense
in TE (an alias of topi.nn.matmul).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is a tough one due to backwards compatibility with nn.dense
, I had some nits and questions here and there. After seeing the changes needed, I wonder if it would be better for now to keep matmul and dense separated (but call into the dense compute implementations for NT ops), as changing the DenseAttrs might cause some compatibility problems. But if we want to ultimately eliminate dense, maybe this is the right path. Curious to see other's thoughts on this
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None): | |||
params : dict of str to tvm.nd.NDArray | |||
Dict of converted parameters stored in tvm.nd.NDArray format | |||
""" | |||
global _USE_DENSE_INSTEAD_OF_MATMUL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to avoid using this global variable? I'm not familiar with the importer but would be nice if we could use an importer config dict or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've also tried several ways, but seems there is no better solution from my view. Python module can be seen as a const singleton, this should be safe if the from_tensorflow
function is the only entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is confusing too. If _USE_DENSE_INSTEAD_OF_MATMUL
is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.
btw, in this case we can simply _USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op
without checking if they are the same or not.
python/tvm/topi/nn/dense.py
Outdated
compute_lambda = lambda i, j: te.sum( | ||
data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k | ||
) | ||
compute_name = "T_dense" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to keep this as dense
or can we unify it to be T_matmul_NT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcf94 please discuss as I have the same issue above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine for it is just a op name. 😄 But the tag dense
has been used in some schedule check, so I think we'd better keep that.
There're some options I can come up with:
- A: Use
T_dense
as name anddense
as tag for NT format, useT_matmul
as name andmatmul
as tag for all other 3 format. - B: Use
T_matmul_NN
,T_matmul_NT
,T_matmul_TN
,T_matmul_TT
as name for each format, usedense
as tag for NT format andmatmul
as tag for others.
What do you think about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally vote for B.
@tkonolige @comaniac @altanh Thanks! Except for these nit comments about name and coding style, I think currently the problem is still how to keep the existing @comaniac 's solution is exactly what I desired. But in the current op strategy, since all of these existing schedules are registered with |
Hi @junrushao1994 @tkonolige @comaniac @altanh I've tried several ways to remove the Currently I think its still better to keep the Also cc @tqchen @FrozenGene |
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None): | |||
params : dict of str to tvm.nd.NDArray | |||
Dict of converted parameters stored in tvm.nd.NDArray format | |||
""" | |||
global _USE_DENSE_INSTEAD_OF_MATMUL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is confusing too. If _USE_DENSE_INSTEAD_OF_MATMUL
is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.
btw, in this case we can simply _USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op
without checking if they are the same or not.
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None): | |||
return func, self._params | |||
|
|||
|
|||
def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None): | |||
def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have a flag here. We should just commit to one codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we're not able to remove all the nn.dense
at this moment and there's not enough AutoTVM template for nn.matmul
.
So the use of nn.matmul
can only be seen as a experimental feature. We should not change the default behavior in case this may affect those who are using nn.dense
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use the dense schedules when A_transpose=false and B_transpose=true. Then we can convert all nn.dense to nn.matmul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.
python/tvm/relay/op/_tensor_grad.py
Outdated
def matmul_grad(orig, grad): | ||
"""Returns [grad' @ weight, data @ grad']""" | ||
data, weight = orig.args | ||
if (orig.attrs["data_transposed"], orig.attrs["weight_transposed"]) == (True, True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor this to not if/else on every possible combination of transpose.
python/tvm/relay/op/nn/nn.py
Outdated
units : int, optional | ||
Number of hidden units of the matmul transformation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the doc has explained enough: "The hidden units." This is copied from the original nn.dense
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is clear at all. Is the hidden units the inner dimension of the matmul?
---------- | ||
data : tvm.relay.Expr | ||
The input data to the operator, | ||
of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't both input shapes by dimension 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the input of matmul is supposed to be a multiple-dim tensor(not limited to 2). This is copied from the original nn.dense
.
Other frameworks like Pytorch also has such definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the definition of the computation above to reflect these shapes then?
python/tvm/relay/op/nn/_nn.py
Outdated
if data_transposed: | ||
out[out.shape[0] - 2] = out[out.shape[0] - 1] | ||
out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really complicated. Shouldn't it just be some part of data_shape and weight_shape depending on the transposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the dimension of data tensor can be more than 2, this is the simplest implementation to do so.
Thanks! @comaniac @tkonolige Most comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't have other comments but just a nit.
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None): | |||
return func, self._params | |||
|
|||
|
|||
def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None): | |||
def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.
Thanks! @comaniac @tkonolige |
* Add Matmul Op * Recover DenseAttrs * Add grad for matmul & some update * Update matmul cuda default schedule * Add blas support for matmul * Lint fix add update doc strings
@jcf94 There a few regressions in PyTorch frontend and other places that might be related, can you take a look? dense should map to matmul(A, B.T). |
* Add Matmul Op * Recover DenseAttrs * Add grad for matmul & some update * Update matmul cuda default schedule * Add blas support for matmul * Lint fix add update doc strings
* Add Matmul Op * Recover DenseAttrs * Add grad for matmul & some update * Update matmul cuda default schedule * Add blas support for matmul * Lint fix add update doc strings
The
nn.dense
op andnn.batch_matmul
op may have bad performance in some model whose weight is not a const parameters. We have some discussion in https://discuss.tvm.apache.org/t/discussion-about-the-weight-layout-of-dense-batchmatmul/10171This PR:
nn.matmul
op that supports data tensor and weight tensor to be transposed or non-transposed formatnn.dense
as an alias ofnn.matmul
when data tensor is non-transposed and weight tensor is transposednn.dense
ornn.matmul
.Since currently we only have complete schedule support for
nn.dense
, thenn.dense
approach is still used by default in different frontends.Will later add full transpose support for
nn.batch_matmul
in another PR.cc @comaniac @tqchen @FrozenGene @junrushao1994 @altanh