merge upstream #37

l3utterfly · 2024-09-15T12:48:53Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* imatrix : fix arg parser * beautify printing first arg

* add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>

…ganov#9375) * common : bring back missing args * move duplication check to test-arg-parser * add check for duplicated env var * correct default values

…by submitting smaller cmdbuffers early. (ggerganov#9118) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.

* Arm AArch64: Documentation updates * Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels * Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats * Add newline to the end of docs/build.md

Update README with instructions how to offload model layers to both local and remote devices

* add LLMUnity to UI projects * add newline to examples/rpc/README.md to fix editorconfig-checker unit test

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Removed WhiteSpaces * ggml : style changes + fix 512-bit nb loop check - fix local scope in switch cases - consistent predicate names - empty lines when necessary - opening braces, spaces - const-correctness - add asserts * Update ggml/src/ggml-quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* rpc : fix nkvo * rpc : buf_size must not be static ref: ggerganov#9337 --------- Co-authored-by: slaren <slarengh@gmail.com>

* common : move arg parser to arg.cpp * better categorize args * add cmake * missing climits * missing cstdarg * common : more explicit includes * fix build * refactor gpt_params_parse * update server readme * fix test --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…9387) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

) This commit updates the comment, which seems to contain a typo or be an outdated comment, in the copy_mask_state function changing the variable n_rs to n_kv. I believe this change is correct and what the comment wants to convey is to copy the states that are not going to be used in the upcoming processing, which are the tokens states from n_seqs up to the number of possible token states n_kv.

* llama_sampler_penalties : clamp penalty_last_n to zero

Co-authored-by: matteo serva <matteo.serva@gmail.com>

* arg : bring back missing ifdef * replace with llama_supports_gpu_offload

Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30) → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/a5d394176e64ab29c852d03346c1fc9b0b7d33eb.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01) → 'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28) → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

…url flag (ggerganov#9255) * feat: Implements retrying logic for downloading models using --model-url flag * Update common/common.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Update common/common.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * apply comments * implements a retry function to avoid duplication * fix editorconfig * change function name --------- Co-authored-by: farbod <farbod.bjary82@gmail.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC` * Update CMakeLists.txt, spaces fix

* lora : raise error if lm_head is ignored * fix style * clarify comment

Signed-off-by: Erhu Feng <2748250768@qq.com>

* feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments

…rganov#9108) * server : added with_pieces functionality to /tokenize endpoint * server : Add tokenize with pieces tests to server.feature * Handle case if tokenizer splits along utf8 continuation bytes * Add example of token splitting * Remove trailing ws * Fix trailing ws * Maybe fix ci * maybe this fix windows ci? --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* feat: remove a sampler from a chain * fix: return removed sampler * fix: safer casting

…ov#9355) * llama : llama_perf + option to disable timings during decode ggml-ci * common : add llama_arg * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * perf : separate functions in the API ggml-ci * perf : safer pointer handling + naming update ggml-ci * minor : better local var name * perf : abort on invalid sampler pointer ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Adding loading page for '/' server requests * set content when model is loading * removed loading html file * updated cmakelist * updated makefile * cleaned up whitespace * cleanup for PR removed error * updated server test to handle 503 HTML * updated server test to handle 503 HTML * ca†ch 503 before parsing json * revert test * account for both api and web browser requests * precommit corrections * eol fix * revert changes to pre-commit * removed print statement * made loading message more descriptive * also support .html files --------- Co-authored-by: VJHack <flymyplane21@gmail.com> Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com>

This commit makes the cell_id variable const in the inp_s_mask block. The motivation for this change is consistency with the code in the inp_s_copy block.

…ov#9463) * cmake : use list(APPEND ...) instead of set() + dedup linker ggml-ci * cmake : try fix sycl * cmake : try to fix sycl 2 * cmake : fix sycl build (ggerganov#9469) * try fix sycl build * use CMAKE_CXX_FLAGS as a string variable --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * one more CMAKE_CXX_FLAGS fix (ggerganov#9471) --------- Co-authored-by: Michael Podvitskiy <podvitskiymichael@gmail.com>

…nov#9459)

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

* Added link to proprietary wrapper for Unity3d into README.md Wrapper has prebuild library and was tested on iOS, Android, WebGL, PC, Mac platforms, has online demos like [this](https://d23myu0xfn2ttc.cloudfront.net/rich/index.html) and [that](https://d23myu0xfn2ttc.cloudfront.net/). * Update README.md Fixes upon review

Co-authored-by: Csaba Kecskemeti <csabakecskemeti@Csabas-Mac-Pro.local>

ngxson and others added 30 commits September 8, 2024 12:12

imatrix : fix arg parser for imatrix (ggerganov#9366)

00b02bb

* imatrix : fix arg parser * beautify printing first arg

llama : sanitize tokens in the upper bound (ggerganov#9359)

eae5971

[SYCL] add check malloc result on device (ggerganov#9346)

2a358fb

* add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>

llama : refactor samplers internal implementation (ggerganov#9370)

19f4a7b

common : restore --n-gpu-layers (ggerganov#9371)

a249843

common : bring back missing args, add env var duplication check (gger…

3f7ccfd

…ganov#9375) * common : bring back missing args * move duplication check to test-arg-parser * add check for duplicated env var * correct default values

cuda : fix FA Q src index (1 -> 0) (ggerganov#9374)

e079bff

rpc : update README [no ci] (ggerganov#9320)

54f376d

Update README with instructions how to offload model layers to both local and remote devices

readme : add LLMUnity to UI projects (ggerganov#9381)

5ed0875

* add LLMUnity to UI projects * add newline to examples/rpc/README.md to fix editorconfig-checker unit test

CUDA: fix variable name conflict for Windows build (ggerganov#9382)

8e6e2fb

readme : update hot topics

38ca6f6

llama : minor sampling refactor (2) (ggerganov#9386)

5fb5e24

rpc : fix segfault with nkvo (ggerganov#9389)

293bebe

* rpc : fix nkvo * rpc : buf_size must not be static ref: ggerganov#9337 --------- Co-authored-by: slaren <slarengh@gmail.com>

make : do not run llama-gen-docs when building (ggerganov#9399)

fb3f249

RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (ggerganov#…

0b4ac75

…9387) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

metal : fix compile warning with GGML_METAL_NDEBUG (#0)

00ba2ff

llama : move random seed generation to the samplers (ggerganov#9398)

49006c6

* llama_sampler_penalties : clamp penalty_last_n to zero

enable --special arg for llama-server (ggerganov#9419)

8d300bd

Co-authored-by: matteo serva <matteo.serva@gmail.com>

arg : bring back missing ifdef (ggerganov#9411)

6cd4e03

* arg : bring back missing ifdef * replace with llama_supports_gpu_offload

sycl : update support conditions (ggerganov#9394)

51b6038

* sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

musa: remove Clang builtins mapping (ggerganov#9421)

b34e023

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

batched-bench : remove unused code (ggerganov#9305)

d2b496b

CUDA: fix --split-mode row race condition (ggerganov#9413)

5af118e

no1wudi and others added 16 commits September 12, 2024 14:28

ci : update HIP SDK to 24.Q3 (ROCm 6.1) (ggerganov#9329)

4dc4f5f

cmake : fix for builds without GGML_CDEF_PUBLIC (ggerganov#9338)

2a82511

* `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC` * Update CMakeLists.txt, spaces fix

lora : raise error if lm_head is ignored (ggerganov#9103)

d4c3c10

* lora : raise error if lm_head is ignored * fix style * clarify comment

llava : fix the script error in MobileVLM README (ggerganov#9054)

e665744

Signed-off-by: Erhu Feng <2748250768@qq.com>

cann: Add host buffer type for Ascend NPU (ggerganov#9406)

e6b7801

* feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments

feat: remove a sampler from a chain (ggerganov#9445)

bd35cb0

* feat: remove a sampler from a chain * fix: return removed sampler * fix: safer casting

llama : make cell_id const in inp_s_mask block (ggerganov#9470)

befaf11

This commit makes the cell_id variable const in the inp_s_mask block. The motivation for this change is consistency with the code in the inp_s_copy block.

server: add data: [DONE] to /chat/completions stream response (ggerga…

dcdcee3

…nov#9459)

ggml : ggml_type_name return "NONE" for invalid values (ggerganov#9458)

822b632

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

cmake : try to fix sycl+intel build (ggerganov#9487)

7596487

py : add "LLaMAForCausalLM" conversion support (ggerganov#9485)

3c7989f

Co-authored-by: Csaba Kecskemeti <csabakecskemeti@Csabas-Mac-Pro.local>

l3utterfly merged commit ccbcce0 into layla-build Sep 15, 2024
54 of 62 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python android server ggml Kompute labels Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge upstream #37

merge upstream #37

l3utterfly commented Sep 15, 2024 •

edited

Loading

merge upstream #37

merge upstream #37

Conversation

l3utterfly commented Sep 15, 2024 • edited Loading

l3utterfly commented Sep 15, 2024 •

edited

Loading