Skip to content

Releases: li-plus/chatglm.cpp

v0.4.2

31 Jul 06:12
60c89b7
Compare
Choose a tag to compare
  • Apply flash attention on vision encoder for lower first-token latency.
  • Fix metal compilation error on Apple silicon chips.

v0.4.1

25 Jul 07:04
0f7a8a9
Compare
Choose a tag to compare
  • Support GLM4V, the first vision language model in GLM series
  • Fix nan/inf logits by rescheduling attention scaling

v0.4.0

21 Jun 03:09
e9989b5
Compare
Choose a tag to compare
  • Dynamic memory allocation on demand to fully utilize device memory. No preset scratch size or memory size any more.
  • Drop Baichuan/InternLM support since they were integrated in llama.cpp.
  • API change:
    • CMake CUDA option: -DGGML_CUBLAS changed to -DGGML_CUDA
    • CMake CUDA architecture: -DCUDA_ARCHITECTURES changed to -DCMAKE_CUDA_ARCHITECTURES
    • num_threads in GenerationConfig was removed: the optimal thread settings will be automatically selected.

v0.3.4

14 Jun 12:52
c9a4a70
Compare
Choose a tag to compare
  • Fix regex negative lookahead for code input tokenization
  • Fix OpenAI API server by using apply_chat_template to calculate tokens

v0.3.3

13 Jun 06:36
6e8bf84
Compare
Choose a tag to compare

Support ChatGLM4 conversation mode

v0.3.2

24 Apr 08:20
a46f474
Compare
Choose a tag to compare
  • Support p-tuning v2 finetuned models for ChatGLM family
  • Fix convert.py for lora models & chatglm3-6b-128k
  • Fix RoPE theta config for 32k/128k sequence length
  • Better cuda cmake script respecting nvcc version

v0.3.1

20 Jan 16:14
eff7f44
Compare
Choose a tag to compare
  • Support function calling in OpenAI api server
  • Faster repetition penalty sampling
  • Support max_new_tokens generation option

v0.3.0

22 Nov 03:08
b071907
Compare
Choose a tag to compare
  • Full functionality of ChatGLM3 including system prompt, function call and code interpreter
  • Brand new OpenAI-style chat API
  • Add token usage information in OpenAI api server to be compatible with LangChain frontend
  • Fix conversion error for chatglm3-6b-32k

v0.2.10

30 Oct 06:35
972b0de
Compare
Choose a tag to compare
  • Support ChatGLM3 in conversation mode.
  • Coming soon: new prompt format for system message and function call.

v0.2.9

22 Oct 03:03
02a6963
Compare
Choose a tag to compare
  • Support InternLM 7B & 20B model architectures