-
Notifications
You must be signed in to change notification settings - Fork 535
[Numpy Refactor] [Model Deployment] Use TVM to accelerate model inference + deployment #1244
Comments
@yzhliu The numpy version has been merged and I've attached the script for generating the missing ops in relay converter. |
any tutorial to deploy gluon gpt-2 model with TVM? |
@carter54 Thanks for your interest. We will add the support + tutorial later and it's in the roadmap. |
Thanks @sxjscience. Looking forward to try it~ |
@sandyhu533 You may refer to the blog here https://medium.com/apache-mxnet/speed-up-your-bert-inference-by-3x-on-cpus-using-apache-tvm-9cf7776cd7f8 for testing. |
@carter54 In case you'd like to try out TVM now, you may try to use docker (or compile TVM with cublas + blas enabled as in https://github.com/dmlc/gluon-nlp/blob/master/tools/docker/install/install_tvm_cpu.sh) and refer to our test cases here: gluon-nlp/tests/test_models.py Lines 71 to 170 in 2032159
We are currently adding a tutorial about how to convert GluonNLP backbones to TVM. You can also wait for our official tutorial. |
@sxjscience Thanks a lot! I will have a try |
Currently, we do have Relay VM support of the NDArray version of MXNet: https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/frontend/mxnet.py
However, we still miss the numpy array support in Relay frontend converter and we should first add the numpy support.
I checked the workloads of BERT + ALBERT + ELECTRA + MobileBERT + RoBERTa (only backbone) and these are the ops used:
After some investigation, the following are the the ops that need to be converted:
We will revise the relay runtime in TVM accordingly.
Code for getting the missing ops for relay:
The text was updated successfully, but these errors were encountered: