⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
python nlp fast translation deep-learning inference pytorch transformer question-answering quantization onnx t5 onnxruntime fastt5 quantized-onnx-models inference-speed
-
Updated
Apr 24, 2023 - Python