Releases: ELS-RD/transformer-deploy
Releases · ELS-RD/transformer-deploy
Add GPT-2 acceleration support
- add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
- refactor triton configuration generation (simplification)
- add GPT-2 model documentation (notebook)
- fix CPU quantization benchmark (was not using the quant model)
- fix sentence transformers bug
add CPU support and generic GPU quantization support
What's Changed
- Update requirements_gpu.txt by @sam-writer in #22
- refactoring by @pommedeterresautee in #27
- add CPU inference support by @pommedeterresautee in #28
- Add QAT support to more models by @pommedeterresautee in #29
Full Changelog: v0.2.0...v0.3.0
add GPU quantization support
- support int-8 GPU quantization
- add a tuto to perform quantization end to end
- add
QDQRoberta
model - switch to ONNX opset 13
- refactoring in the TensorRT engine creation
- fix bugs
- add auth token (for private HF repo)
What's Changed
- Update triton by @pommedeterresautee in #11
- fix README.md by @pommedeterresautee in #13
- Fix install errors by @sam-writer in #20
- Add auth token by @sam-writer in #19
- Support GPU INT-8 quantization by @pommedeterresautee in #15
New Contributors
- @sam-writer made their first contribution in #20
Full Changelog: v0.1.1...v0.2.0
update Triton image to 21.11-py3
- update Docker image
- update documentation
from PoC to library
- switch from a proof of concept to a library
- add support for TensorRT Python API (for best performances)
- improve documentation (separate Hugging Face Infinity thing from the doc, add benchmark, etc.)
- fix issues with mixed precision
- add license
- add tests, Github actions, Makefile
- change the way the Docker image is built
first release
all the scripts to reproduce https://medium.com/p/e1be0057a51c