Releases · NeoZhangJianyu/llama.cpp

21 Sep 08:19

3ae8374

Latest

use 2024.1

Assets 19

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-09-21T08:19:09Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-09-21T08:19:16Z
llama-update_oneapi-b3789-3ae8374-bin-macos-arm64.zip

53.9 MB 2024-09-21T08:19:23Z
llama-update_oneapi-b3789-3ae8374-bin-macos-x64.zip

55 MB 2024-09-21T08:19:25Z
llama-update_oneapi-b3789-3ae8374-bin-ubuntu-x64.zip

59.6 MB 2024-09-21T08:19:26Z
llama-update_oneapi-b3789-3ae8374-bin-win-avx-x64.zip

7.97 MB 2024-09-21T08:19:28Z
llama-update_oneapi-b3789-3ae8374-bin-win-avx2-x64.zip

7.97 MB 2024-09-21T08:19:28Z
llama-update_oneapi-b3789-3ae8374-bin-win-avx512-x64.zip

7.97 MB 2024-09-21T08:19:29Z
llama-update_oneapi-b3789-3ae8374-bin-win-cuda-cu11.7.1-x64.zip

145 MB 2024-09-21T08:19:30Z
llama-update_oneapi-b3789-3ae8374-bin-win-cuda-cu12.2.0-x64.zip

144 MB 2024-09-21T08:19:32Z
Source code (zip)

2024-09-21T07:03:29Z
Source code (tar.gz)

2024-09-21T07:03:29Z

20 Sep 04:18

github-actions

update_oneapi-b3788-f557ccf

f557ccf

update_oneapi-b3788-f557ccf

update oneapi to 2024.2

Assets 19

20 Sep 04:05

github-actions

b3787

6026da5

b3787

server : clean-up completed tasks from waiting list (#9531)

ggml-ci

Assets 19

12 Sep 03:42

github-actions

b3735

df4b794

b3735

cann: Fix error when running a non-exist op (#9424)

Assets 19

07 Sep 07:35

github-actions

b3678

9b2c24c

b3678

server : simplify state machine for slot (#9283)

* server : simplify state machine for slot

* add SLOT_STATE_DONE_PROMPT

* pop_deferred_task

* add missing notify_one

* fix passkey test

* metrics : add n_busy_slots_per_decode

* fix test step

* add test

* maybe fix AddressSanitizer?

* fix deque ?

* missing lock

* pop_deferred_task: also notify

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 19

24 Jul 02:48

github-actions

b3449

de28008

b3449

examples : Fix `llama-export-lora` example (#8607)

* fix export-lora example

* add more logging

* reject merging subset

* better check

* typo

Assets 20

04 Jul 01:56

github-actions

b3291

f619024

b3291

[SYCL] Remove unneeded semicolons (#8280)

Assets 20

14 Jun 06:10

github-actions

b3145

172c825

b3145

rpc : fix ggml_backend_rpc_supports_buft() (#7918)

Assets 20

23 Apr 01:25

github-actions

b2716

4e96a81

b2716

[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 fl…

…ag activated (#6767)

* Fix FP32/FP16 build instructions

* Fix typo

* Recommended build instruction

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Recommended build instruction

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Recommended build instruction

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Add comments in Intel GPU linux

---------

Co-authored-by: Anas Ahouzi <112881240+aahouzi-intel@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

Assets 19

17 Apr 02:08

github-actions

b2688

facb8b5

b2688

convert : fix autoawq gemma (#6704)

* fix autoawq quantized gemma model convert error

using autoawq to quantize gemma model will include a lm_head.weight tensor in model-00001-of-00002.safetensors. it result in this situation that convert-hf-to-gguf.py can't map lm_head.weight. skip loading this tensor could prevent this error.

* change code to full string match and print necessary message

change code to full string match and print a short message to inform users that lm_head.weight has been skipped.

---------

Co-authored-by: Zheng.Deng <32841220+CUGfred@users.noreply.github.com>

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: NeoZhangJianyu/llama.cpp

update_oneapi-b3789-3ae8374

update_oneapi-b3788-f557ccf

b3787

b3735

b3678

b3449

b3291

b3145

b2716

b2688