v2.1.0

Latest

Latest

laiwenzh released this 11 Feb 07:28

· 2 commits to main since this release

069c74e

What's Changed

[JSON mode]: FormatEnforcer use cudaMallocHost for scores buffer by @WangNorthSea in #56
[A16W8 & A8W8]: further optimization for Ampere A16W8 fused gemm kernel 2. fix lora doc by @wyajieha in #58
[Multimodal]: Support LLM quantization with GPTQ and AXWY by @x574chen in #60
[PKG]: Reduce package size by only compiling flash-attn src with hdim128 by @laiwenzh in #62
[MOE]: add high performance moe kernel; fix a16w8 compile bug for sm<80 by @laiwenzh in #67

New Contributors

@wyajieha made their first contribution in #58

Full Changelog: v2.0.0...v2.1.0

Contributors

x574chen, WangNorthSea, and 2 other contributors

Assets 14