Add GPT-2 acceleration support

Latest

Latest

pommedeterresautee released this 08 Feb 23:07

· 136 commits to main since this release

add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
refactor triton configuration generation (simplification)
add GPT-2 model documentation (notebook)
fix CPU quantization benchmark (was not using the quant model)
fix sentence transformers bug

Assets 2