Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Topi & Relay] Add quantization support for the vision transform model in GPU #7814

Merged
merged 5 commits into from
Apr 14, 2021
Merged

[Topi & Relay] Add quantization support for the vision transform model in GPU #7814

merged 5 commits into from
Apr 14, 2021

Conversation

huochaitiantang
Copy link
Contributor

We submit this PR to add quantization support for the vision transform (vit) model in GPU. The main change is as follows:

1, In vit model, time-consuming operators are batch_matmul, so we first add the compute and schedule of batch_matmul_int8.cuda in tvm.topi.cuda

2, To support the quantization of batch_matmul, we then add batch_matmul_rewrite and BatchMatmulRealize in tvm.relay.quantize

3, The kl -divergence calibrate could not preserve the accuracy of vit model well, so we add the _percentile_scale function

For the vit-B32-224 model, the performance is as follows:

  • Top-1 accuracy in Imagenet validation

    • paper: 73.38
    • nonofficial-model-fp32: 73.27
    • nonofficial-model-int8: 72.78
  • The latency in GTX1660 GPU

    • nonofficial-model-fp32: 10.32 ms
    • nonofficial-model-int8: 4.93 ms

Thanks for your review! @jcf94 @tqchen

@XHPlus
Copy link

XHPlus commented Apr 9, 2021

Thanks for the reviewer. We will keep updating more results for other ViT models and contributing more quantization calibration algorithms.

Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @huochaitiantang
This PR looks great! I have only few comments.

python/tvm/relay/op/strategy/cuda.py Show resolved Hide resolved
Comment on lines 46 to 49
if not os.path.exists(logfile):
os.system("wget https://github.com/TheGreatCold/tvm-vit/raw/master/{}".format(logfile))
if not os.path.exists(onnx_path):
os.system("wget https://github.com/TheGreatCold/tvm-vit/raw/master/{}".format(onnx_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a unit test, I'm thinking that this may not be so good to involve a resource outside. (Network problem or changes to the tvm-vit repo may break the UT. At least use git commit instad of branch like: https://github.com/TheGreatCold/tvm-vit/raw/d2aa1e60eef42e2fdedbd1e13aa85ac5faf0a7fc/vit_B32_224.onnx will be better)
I'm not sure if there's any better solution for this. @tqchen Do you have any suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. We updated the download codes based on your suggestion. Besides, the wget method is not compatible on different platforms, so we use the urllib library instead.

Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @huochaitiantang @XHPlus . Have fun with TVM. Looking forward to see more contributions from you!

@jcf94 jcf94 merged commit 90dce48 into apache:main Apr 14, 2021
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…l in GPU (apache#7814)

* Add cuda batch matmul int8 support for quantized vit model

* Fix for combine parallel pass with dense and batch_matmul

* Reformat based on lint

* Add plevel & update the file download method
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…l in GPU (apache#7814)

* Add cuda batch matmul int8 support for quantized vit model

* Fix for combine parallel pass with dense and batch_matmul

* Reformat based on lint

* Add plevel & update the file download method
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…l in GPU (apache#7814)

* Add cuda batch matmul int8 support for quantized vit model

* Fix for combine parallel pass with dense and batch_matmul

* Reformat based on lint

* Add plevel & update the file download method
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
…l in GPU (apache#7814)

* Add cuda batch matmul int8 support for quantized vit model

* Fix for combine parallel pass with dense and batch_matmul

* Reformat based on lint

* Add plevel & update the file download method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants