pommedeterresautee
released this
08 Feb 23:07
·
136 commits
to main
since this release
- add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
- refactor triton configuration generation (simplification)
- add GPT-2 model documentation (notebook)
- fix CPU quantization benchmark (was not using the quant model)
- fix sentence transformers bug