Cuda mmq 256k 5 #227

Nexesenex · 2024-07-11T00:13:43Z

No description provided.

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment

…rganov#8283) * Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from ggerganov#7809 and migrate to the new filenames. * Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

…gerganov#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

* update internlm2 * remove unused file * fix lint

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* Upd gguf-py/readme * Bump patch version for release

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

ggml-ci

Alcpz and others added 14 commits July 9, 2024 22:03

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (ggerganov#8372)

5b0b8d8

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (ggerganov#8392)

a03e8dd

Update README.md to fix broken link to docs (ggerganov#8399)

fd560fe

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

Server: Enable setting default sampling parameters via command-line (g…

a59f8fd

…gerganov#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

py : fix extra space in convert_hf_to_gguf.py (ggerganov#8407)

8f0fad4

py : fix converter for internlm2 (ggerganov#8321)

e4dd31f

* update internlm2 * remove unused file * fix lint

llama : add assert about missing llama_encode() call (ggerganov#8400)

a8be1e6

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

msvc : silence codecvt c++17 deprecation warnings (ggerganov#8395)

7a80710

llama : C++20 compatibility for u8 strings (ggerganov#8408)

cc61948

gguf-py rel pipeline (ggerganov#8410)

83321c6

* Upd gguf-py/readme * Bump patch version for release

ggml : move sgemm sources to llamafile subfolder (ggerganov#8394)

6b2a849

ggml-ci

CUDA: optimize and refactor MMQ

f4b8df4

Nexesenex merged commit 1c2832e into Nexesenex:mmqrefac Jul 11, 2024
10 of 13 checks passed

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU examples python server ggml SYCL build labels Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda mmq 256k 5 #227

Cuda mmq 256k 5 #227

Nexesenex commented Jul 11, 2024

Cuda mmq 256k 5 #227

Cuda mmq 256k 5 #227

Conversation

Nexesenex commented Jul 11, 2024