merge upstream #47

l3utterfly · 2024-11-29T10:09:42Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too

* [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>

* CANN: Fix the bug build fail on Ascend310P under two cases: 1) Manual specify SOC_TYPE 2) Under some unusual compile environment * Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU. * fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version

* kompute: op_unary: reject unsupported parameters Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: softmax: implement ALiBi support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: rope: implement neox and phi3 support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q4_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_f16 permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q6_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> --------- Signed-off-by: Sergio Lopez <slp@redhat.com>

* ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard

ggml-ci

…penai client library (ggerganov#10568) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3

This is an incremental improvement over ggerganov#9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.

* [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>

This PR fixes the failing MUL_MAT tests for the sycl backend.

IMbackK and others added 19 commits November 27, 2024 17:10

Add some minimal optimizations for CDNA (ggerganov#10498)

3ad5451

* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too

common : fix duplicated file name with hf_repo and hf_file (ggerganov…

9f91251

…#10550)

CANN: ROPE operator optimization (ggerganov#10540)

b742013

* [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>

CANN: Update cann.md to display correctly in CLion (ggerganov#10538)

c6bc739

ggml-cpu: support IQ4_NL_4_4 by runtime repack (ggerganov#10541)

c202cef

* ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard

cmake : fix ARM feature detection (ggerganov#10543)

eea986f

ggml-ci

ggml : fix row condition for i8mm kernels (ggerganov#10561)

76b27d2

ggml-ci

ci : fix tag name in cuda and hip releases (ggerganov#10566)

e90688e

docs: fix outdated usage of llama-simple (ggerganov#10565)

7281cf1

common: fix warning message when no GPU found (ggerganov#10564)

8907193

server : (tests) don't use thread for capturing stdout/stderr, bump o…

6c59567

…penai client library (ggerganov#10568) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3

llama : add missing model types

4c0a95b

ggml : remove redundant copyright notice + update authors

dc22344

llava: return false instead of exit (ggerganov#10546)

678d799

CANN: RoPE operator optimization (ggerganov#10563)

938f608

* [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>

sycl : Reroute permuted mul_mats through oneMKL (ggerganov#10408)

266b851

This PR fixes the failing MUL_MAT tests for the sycl backend.

l3utterfly merged commit 61607e8 into layla-build Nov 29, 2024
56 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan examples devops python server ggml Kompute labels Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge upstream #47

merge upstream #47

l3utterfly commented Nov 29, 2024

merge upstream #47

merge upstream #47

Conversation

l3utterfly commented Nov 29, 2024