⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
-
Updated
Apr 24, 2023 - Python
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Inference speed / accuracy tradeoff on text classification with transformer models such as BERT, RoBERTa, DeBERTa, SqueezeBERT, MobileBERT, Funnel Transformer, etc.
Add a description, image, and links to the inference-speed topic page so that developers can more easily learn about it.
To associate your repository with the inference-speed topic, visit your repo's landing page and select "manage topics."