Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUTLASS][Ansor] Combine CUTLASS and Ansor #13879

Merged
merged 7 commits into from
Feb 1, 2023

Conversation

qingchanghan
Copy link
Contributor

@qingchanghan qingchanghan commented Jan 31, 2023

Description

This PR adds a test script to combine CUTLASS and Ansor, which can use TensorCore kernels by CUTLASS and keep op fusion and automatic tuning of Ansor.

Modifications

  1. Add a test script to show how to combine CUTLASS and Ansor.
  2. Add the other_targets parameter for Ansor's extract_tasks function, to pass cutlass target to call_all_topi_funcs.

Performance

  • Bert-large
  • A10, CUDA 11.8
  • FP16
  • Input shape: (8, 128)

Latency(ms)

Ansor (n=3000) CUTLASS+TOPI CUTLASS+Ansor (n=3000) MetaSchedule (n=3000) CUTLASS+MetaSchedule (n=3000)
55.8870 20.2297 17.2543 19.2774 17.5876

Test scripts

https://github.com/qingchanghan/tvm-cutlass-eval/tree/combine-cutlass-ansor/bert

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 31, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: cutlass, ansor See #10317 for details

Generated by tvm-bot

@junrushao
Copy link
Member

This is interesting and extremely helpful! Just curious - have you tried out CUTLASS + MetaSchedule?

@qingchanghan
Copy link
Contributor Author

qingchanghan commented Jan 31, 2023

This is interesting and extremely helpful! Just curious - have you tried out CUTLASS + MetaSchedule?

Thank you for your advice! I tried CUTLASS + MetaSchedule and completed the test results. The performance diff between CUTLASS+Ansor and CUTLASS+MetaSchedule is negligible. I guess that the time ratio of non-MatMul operations is low in bert model and most tasks are optimized by CUTLASS.

Test script of CUTLASS + MetaSchedule: https://github.com/qingchanghan/tvm-cutlass-eval/blob/combine-cutlass-ansor/bert/cutlass_ms.py

@qingchanghan
Copy link
Contributor Author

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge your test into test_cutlass.py (while disabling the tuning test by default), and remove code duplications.

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@qingchanghan
Copy link
Contributor Author

Please merge your test into test_cutlass.py (while disabling the tuning test by default), and remove code duplications.

Thanks for your review! The merge is done. BTW, I found that there are some errors when calling conv2d test in test_cutlass.py. I guess it is the compatibility problem introduced by tvm update. I will try to fix them later.

@Hzfengsy
Copy link
Member

Hzfengsy commented Feb 1, 2023

You could set a follow-up PR to fix that issue, and we can get it in quickly.

BTW, we are going to make further improvements to Meta-schedule with TensorCore recently, beginning with #13891

@masahi masahi merged commit ba936e9 into apache:main Feb 1, 2023
@masahi
Copy link
Member

masahi commented Feb 1, 2023

BTW, I found that there are some errors when calling conv2d test in test_cutlass.py

Is that an accuracy error from hardswish? I'm aware that assert_allclose in test_cutlass.py can fail depending on CUDA version / card etc.

@masahi
Copy link
Member

masahi commented Feb 2, 2023

ok residual block fusion seems to be broken. The "residual" tensor is supposed to be passed as a different tensor than the input tensor to conv2d. So there should be 4 inputs, data, weight, bias, residual_tensor (which is the same as data).

But now we are getting only 3 inputs.

@masahi
Copy link
Member

masahi commented Feb 2, 2023

@qingchanghan Here is my fix for the broken residual block fusion test submitted to a different repo tlc-pack/relax@13a73e5. It would be great if you can improve on it (e.g., removing TODO I included) and send it to main.

@qingchanghan
Copy link
Contributor Author

BTW, we are going to make further improvements to Meta-schedule with TensorCore recently, beginning with #13891

This is cool! I will also learn Meta-schedule and keep an eye on it. Looking forward to further improvements!

@qingchanghan
Copy link
Contributor Author

It would be great if you can improve on it (e.g., removing TODO I included) and send it to main.

OK. I'll try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants