-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUTLASS][Ansor] Combine CUTLASS and Ansor #13879
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
This is interesting and extremely helpful! Just curious - have you tried out CUTLASS + MetaSchedule? |
Thank you for your advice! I tried CUTLASS + MetaSchedule and completed the test results. The performance diff between CUTLASS+Ansor and CUTLASS+MetaSchedule is negligible. I guess that the time ratio of non-MatMul operations is low in bert model and most tasks are optimized by CUTLASS. Test script of CUTLASS + MetaSchedule: https://github.com/qingchanghan/tvm-cutlass-eval/blob/combine-cutlass-ansor/bert/cutlass_ms.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge your test into test_cutlass.py
(while disabling the tuning test by default), and remove code duplications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for your review! The merge is done. BTW, I found that there are some errors when calling conv2d test in |
You could set a follow-up PR to fix that issue, and we can get it in quickly. BTW, we are going to make further improvements to Meta-schedule with TensorCore recently, beginning with #13891 |
Is that an accuracy error from hardswish? I'm aware that |
ok residual block fusion seems to be broken. The "residual" tensor is supposed to be passed as a different tensor than the input tensor to conv2d. So there should be 4 inputs, data, weight, bias, residual_tensor (which is the same as data). But now we are getting only 3 inputs. |
@qingchanghan Here is my fix for the broken residual block fusion test submitted to a different repo tlc-pack/relax@13a73e5. It would be great if you can improve on it (e.g., removing TODO I included) and send it to |
This is cool! I will also learn Meta-schedule and keep an eye on it. Looking forward to further improvements! |
OK. I'll try. |
Description
This PR adds a test script to combine CUTLASS and Ansor, which can use TensorCore kernels by CUTLASS and keep op fusion and automatic tuning of Ansor.
Modifications
other_targets
parameter for Ansor'sextract_tasks
function, to pass cutlass target tocall_all_topi_funcs
.Performance
Latency(ms)
Test scripts
https://github.com/qingchanghan/tvm-cutlass-eval/tree/combine-cutlass-ansor/bert