LLaMa.cpp support #49

Alex20129 · 2024-12-11T02:17:15Z

I'm currently playing with LLaMa.cpp (qwen-instruct GGUF model) in console chat mode. And I wondered if it would be possible to seamlessly integrate LLaMa.cpp with Qt Creator? That's how I got here.

i would like to try QodeAssist with LLaMa.cpp as a backend.

Alex20129 · 2024-12-11T08:20:28Z

According to README, LLaMa.cpp server has a special infill endpoint for code 'infilling'.

Palm1r · 2024-12-11T08:33:41Z

It is good idea, and thank you for information. Yeah, it is possible to use llama.cpp like backend, but one things. For now QodeAssist works only with FIM model for code completion. I am working for extending that by instruct models. Maybe today or tomorrow I will finish. And I am waiting to QtCreator 15.0.1. Because Qt has already shared only 15.0.1 and I can't build 15.0.0 by github actions for all platforms. If you have time and patience, then wait and I will add everything and release it in the next version

Palm1r · 2024-12-19T15:07:34Z

I am back to this, @Alex20129 do you have a model or link to model which support FIM? I need for testing.
Because for now if I test by terminal, I have like these
curl --request POST
--url http://localhost:8080/infill
--header "Content-Type: application/json"
--data '{
"prompt": "bla",
"input_prefix": "def hello_",
"input_suffix": "():\n return "Hello World"",
"temperature": 0.8
}'
{"error":{"code":501,"message":"Infill is not supported by this model: prefix token is missing. suffix token is missing. middle token is missing. ","type":"not_supported_error"}}%

Alex20129 · 2024-12-25T01:39:10Z

IDK how to test FIM function properly, because i've only used it in chat mode. Anyway, here is the models:
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF/tree/main
im using qwen-2.5-coder-32b-instruct-q8_0.gguf model.
And when i run LLaMa.CPP i can see this:

...
llm_load_print_meta: model type       = 32B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 32,76 B
llm_load_print_meta: model size       = 32,42 GiB (8,50 BPW) 
llm_load_print_meta: general.name     = Qwen2.5 Coder 32B Instruct AWQ
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
llm_load_print_meta: max token length = 256
...

which makes me think that the "qwen-coder" itself and all its derivatives were specifically trained with FIM objective in mind.

Alex20129 · 2024-12-25T01:44:47Z

I also tried running rombos-coder-2.5-qwen-32b-q5_k_s.gguf which is compact and can be fully loaded into GPU, but I'm not sure about its quality since it's a derivative model mixed by some enthusiast.

Palm1r added the enhancement New feature or request label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaMa.cpp support #49

LLaMa.cpp support #49

Alex20129 commented Dec 11, 2024

Alex20129 commented Dec 11, 2024

Palm1r commented Dec 11, 2024

Palm1r commented Dec 19, 2024 •

edited

Loading

Alex20129 commented Dec 25, 2024

Alex20129 commented Dec 25, 2024

LLaMa.cpp support #49

LLaMa.cpp support #49

Comments

Alex20129 commented Dec 11, 2024

Alex20129 commented Dec 11, 2024

Palm1r commented Dec 11, 2024

Palm1r commented Dec 19, 2024 • edited Loading

Alex20129 commented Dec 25, 2024

Alex20129 commented Dec 25, 2024

Palm1r commented Dec 19, 2024 •

edited

Loading