Releases: NeoZhangJianyu/llama.cpp
Releases · NeoZhangJianyu/llama.cpp
update_oneapi-b3789-3ae8374
use 2024.1
update_oneapi-b3788-f557ccf
update oneapi to 2024.2
b3787
server : clean-up completed tasks from waiting list (#9531) ggml-ci
b3735
cann: Fix error when running a non-exist op (#9424)
b3678
server : simplify state machine for slot (#9283) * server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3449
examples : Fix `llama-export-lora` example (#8607) * fix export-lora example * add more logging * reject merging subset * better check * typo
b3291
[SYCL] Remove unneeded semicolons (#8280)
b3145
rpc : fix ggml_backend_rpc_supports_buft() (#7918)
b2716
[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 fl… …ag activated (#6767) * Fix FP32/FP16 build instructions * Fix typo * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Add comments in Intel GPU linux --------- Co-authored-by: Anas Ahouzi <112881240+aahouzi-intel@users.noreply.github.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
b2688
convert : fix autoawq gemma (#6704) * fix autoawq quantized gemma model convert error using autoawq to quantize gemma model will include a lm_head.weight tensor in model-00001-of-00002.safetensors. it result in this situation that convert-hf-to-gguf.py can't map lm_head.weight. skip loading this tensor could prevent this error. * change code to full string match and print necessary message change code to full string match and print a short message to inform users that lm_head.weight has been skipped. --------- Co-authored-by: Zheng.Deng <32841220+CUGfred@users.noreply.github.com>