-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] C++ topi operators #312
Conversation
Summary: This diff implements C++ topi contributions for: - relu with parametrix threshold - pad with generic padBefore / padAfter specification - matmult with transposes - conv2d_nchw, conv2d_hwcn with runtime constant padding and strides - depthwise_conv2d_nchw with runtime constant padding and strides - group_conv2d_ngchw with runtime constant padding and strides - broadcast_to a broadcastable shape - broadcast_bop where bop is an usual binary op (+ - * / %) Convolution padding is implemented using the pad operation. To avoid extra memory consumption, it is generally recommended to inline the padding with the autoinliner. Unfortunately in its current form the elemwise checks are too restrictive to allow inlining. So this diff also proposes an extension to LHS injective (i.e. no reduction axis in the current IR design) Test Plan: Tested in C++ testsuite in a separate repository, I am looking for suggestions to quickly spin up some tests for tvm. Reviewers: tqchen Subscribers: Tasks: Tags: Blame Revision:
It is indeed helpful to introduce an AutoInlineInjective pass. Now, there are three types of of "injective like relatiosn"
It is helpful to still keep these three categories, since they might leads to different ways of scheduling. For example, sometimes it is helpful to introduce a caching stage for general injective operator to make the memory layout access more friendly |
In light of that, maybe it is helpful to introduce an AutoInlineInjective pass, this makes the intention explicit through function name, as opposed to the additional boolean argument, which user need to lookup through the document |
topi/include/topi/detail/broadcast.h
Outdated
return ivars; | ||
} | ||
|
||
typedef std::function<tvm::Expr(tvm::Expr, tvm::Expr)> BinaryExpr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++11 style ```using BinaryExpr =````
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see also next comment, this can be safely deleted
topi/include/topi/nn.h
Outdated
namespace topi { | ||
|
||
template <typename T> | ||
tvm::Expr map(const tvm::Array<tvm::Expr>& exprs, T op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for user facing function, use doxygen style comment
topi/include/topi/broadcast.h
Outdated
|
||
namespace topi { | ||
|
||
inline tvm::Tensor broadcast_to(const tvm::Array<tvm::Expr>& outputShape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with numpy, use broadcast_to(array, shape)
topi/include/topi/detail/broadcast.h
Outdated
for (int i = 0; i < ovars.size(); ++i) { | ||
bool found = false; | ||
for (int j = 0; j < myVars.size(); ++j) { | ||
if (tvm::ir::Equal(allVars[i], myVars[j])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use allVars[i].same_as, will be faster than deep compare
topi/include/topi/detail/broadcast.h
Outdated
return bh; | ||
} | ||
|
||
inline tvm::Array<tvm::Expr> inputShapeFromBroadcast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be more approperiate as InputIndexFromBroadcast?
topi/include/topi/detail/broadcast.h
Outdated
|
||
typedef std::function<tvm::Expr(tvm::Expr, tvm::Expr)> BinaryExpr; | ||
|
||
inline tvm::Tensor withBroadcast(BinaryExpr op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
templatize op will make compiler much easier to inline it, if we want good performance without the explicit overhead of creating std::function.
template<FBinaryExpr>
function(FBinaryExpr op, ...)
topi/include/topi/detail/broadcast.h
Outdated
// Only inject 0 here if we have not yet reached the dimension of I | ||
// (i.e. this must be a 1) | ||
if (!found && (ovars.size() - i) <= expectedDims) { | ||
ivars.push_back(tvm::Expr(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safter as make_zero(myVars[0].type())
, in case type of index differ from int32
topi/include/topi/nn.h
Outdated
} | ||
|
||
template <typename T> | ||
inline tvm::Tensor relu(const tvm::Tensor& x, T threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default threshold to static_cast(0)
Thanks for the review! This should address the requested changes as well as style and lint issues. |
Thanks! this is merged. Please have follow-up PRs to add doxygen comment to the user facing functions |
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
Summary:
This diff implements C++ topi contributions for:
Convolution padding is implemented using the pad operation.
To avoid extra memory consumption, it is generally recommended to inline the padding with the autoinliner.
Unfortunately in its current form the elemwise checks are too restrictive to allow inlining.
So this diff also proposes an extension to LHS injective (i.e. no reduction axis in the current IR design)
Test Plan:
Tested in C++ testsuite in a separate repository, I am looking for suggestions to quickly spin up some tests for tvm.
Reviewers: tqchen
Subscribers:
Tasks:
Tags:
Blame Revision: