Skip to content

v2.1.0

Latest
Compare
Choose a tag to compare
@laiwenzh laiwenzh released this 11 Feb 07:28
· 2 commits to main since this release
069c74e

What's Changed

  • [JSON mode]: FormatEnforcer use cudaMallocHost for scores buffer by @WangNorthSea in #56
  • [A16W8 & A8W8]: further optimization for Ampere A16W8 fused gemm kernel 2. fix lora doc by @wyajieha in #58
  • [Multimodal]: Support LLM quantization with GPTQ and AXWY by @x574chen in #60
  • [PKG]: Reduce package size by only compiling flash-attn src with hdim128 by @laiwenzh in #62
  • [MOE]: add high performance moe kernel; fix a16w8 compile bug for sm<80 by @laiwenzh in #67

New Contributors

Full Changelog: v2.0.0...v2.1.0