-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon] Support template-free meta schedule tuning #12854
Conversation
"relay.FuseOps.link_params": link_params, | ||
"relay.backend.use_meta_schedule": True, | ||
"relay.backend.tir_converter": "default", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
tvm/python/tvm/meta_schedule/relay_integration.py
Lines 87 to 91 in 370abe6
if pass_config is None: | |
pass_config = { | |
"relay.backend.use_meta_schedule": True, | |
"relay.backend.tir_converter": tir_converter, | |
} |
relay.FuseOps.link_params
config, others are for compatibility with the existing code.
1e80481
to
0fcb012
Compare
@tvm-bot rerun |
0fcb012
to
21dbad9
Compare
postproc.RewriteTensorize(vectorize_init_loop=True), | ||
] | ||
|
||
if True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a leftover from something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally left it so that people can experiment with both then
and else
paths. The else
path just compiles and runs the best schedule found in my experiment, which reproduces 440 GOPs performance.
Thanks @masahi @kparzysz-quic, the PR has been merged! |
* [Metaschedule] Support template-free tuning on Hexagon * enable multi threading * update tests * black
Building on #12845, this PR adds an initial support for template-free auto tuning on Hexagon.
Test cases demonstrate:
vrmpy
auto tensorization for TE int8dense
(weight pre-packed), achieving 440 GOPs on SD888.Known issues:
link-params = True
, required by Hexagon, causes identical workloads to be tuned as distinct tasks. So e2d tuning is very slow without the changes from 12706.nn.dense
essentially requires metascheduleRewriteLayout
postproc: I found that the memory access pattern ofnn.dense
,C[i, j] += A[i, k] * B[j, k]
, where thej
axis is vectorized, performs terribly on Hexagon. But the implementation ofRewriteLayout
is completely incompatible withlink-params = True
. Until we fix this, we cannot enableRewriteLayout
for Hexagon and hence tuningnn.dense
(andnn.batch_matmul
) is not supported for now.cc @kparzysz-quic @junrushao @tmoreau89