-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable QNN primitives for DNNL runtime #11642
Conversation
6be73ae
to
8e6b25e
Compare
Hi apeskov, I'm working on zero-copy in DNNL which may relate to the in-place primitives you mentioned:
I tried to enable zero-copy when the tensors are read or write by DNNL primitives by assigning the handle of the DNNL memory to TVM buffer before the execution of the primitives. It works for most of the CV models I have tested, while produce wrong results in cases when:
With my understanding, the |
Hi @yangulei, As I remember zero-copy input/output of tensors should already works in TVM main. If you define relay layouts match with DNNL expectations it will use external buffer as is without any copying or processing. It was one of the goal of PR #11345. Could you please be more specific about scenarios you would like to optimise? Regarding In case of missing layouts DNNL BOC runtime automatically inject required reorder primitives. And it will looks like next:
Problem in tensor tmp_3. There is 2 primate which produce data of tmp_3. That's break concept of data flow graph.
|
I tried zero-copy before the merging of #11345, it's not necessary now since it's already covered in your work. Thanks for your explanation about Regarding to the automatic injection of required reorder primitives, I think it's useful to ensure optimal DNNL layout in runtime. But I think it's better to do the layout transform in |
Absolutely agree with you! Automatic reordering is just a fallback mechanics which prevent DNNL from using "ref:any" implementations. "TensorRequisite" mechanic was designed specially to do nothing in case of matched layouts, and smooth out performance degradation if layout was not specified properly. Current mechanic with "get_optimal_layout_for_XXX" has a lot of limitations and may works incorrect for some cases. So auto reordering is still needed. In context of enabling |
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
@comaniac @crazydemo you may be also interested in topic of this change. Feel free to comment. |
} else { | ||
LOG(FATAL) << "Unrecognized DNNL pattern: " << name; | ||
} | ||
|
||
if (args.empty() && args.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's quickly fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely a typo. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! We have quantized bert-base
working in TVM using VNNI tensorization. I wanted to compare against DNNL performance on the same model, I can try that now.
Also, the use of pattern matching in place of GetRootCall
thing is interesting.
} else { | ||
LOG(FATAL) << "Unrecognized DNNL pattern: " << name; | ||
} | ||
|
||
if (args.empty() && args.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's quickly fix this
Support of QNN primitives and their fused versions for DNNL
DNNL has support of int8 convolution/dense/matmul primitives with some list of post operations. Which usually looks like next:
Equivalent graph on relay level will looks like:
To reach this mapping I have to introduce one more legalisation pass
legalize_qnn_for_dnnl
which reformulate qnn relay pattern to DNNL compatible form. Subsequent DNNL primitive handling is regular (pattern matching, transform to json, dnnl runtime support of new qnn nodes).Additional things
GetRootCall(fn->body.as<CallNode>(), 2, {"conv", "add"})
, which is not fully correct and inconvenient. In this PR I useDFPatternMatcher
for this purpose.JSONRuntimeBase
suppose that you visit original graph as is without dropping of some paths or recalculation of constants. To avoid this issue this PR introducesDNNLConstantUpdater
utility which guarantee correctness of constant handling.