Skip to content

Releases: MDK8888/GPTFast

GPTFast-0.3.1

22 Aug 04:11
Compare
Choose a tag to compare

GPTFast 0.3.1 is here 🚀🚀🚀!

  • Stabilized GPTQ for all models, both with and without bias.
  • Customized W4A16 matmul kernels with tiling that outperform nn.Linear by 30% on RTX 3050.

GPTFast 0.3.0

21 Jun 01:51
60ab0b8
Compare
Choose a tag to compare
  • GPTQ INT4 quantization available for all HF models
  • Accelerates inference speed by 7.6x-9x
  • Integrates optimized INT4 matrix multiplication kernels from the PyTorch team for all HF models

GPTFast 0.2.1

02 Apr 12:50
e606978
Compare
Choose a tag to compare
  • Minor fixes for PyYAML

GPTFast 0.2.0

02 Apr 04:16
7653cca
Compare
Choose a tag to compare
  • Inference speeds are now accelerated by 6-8.5x
  • Static key-value caching is now enabled for all Hugging Face models
  • Support for generic sampling functions in addition to argmax
  • Debugged speculative decoding