-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge upstream #47
merge upstream #47
Commits on Nov 27, 2024
-
Add some minimal optimizations for CDNA (ggerganov#10498)
* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too
Configuration menu - View commit details
-
Copy full SHA for 3ad5451 - Browse repository at this point
Copy the full SHA 3ad5451View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f91251 - Browse repository at this point
Copy the full SHA 9f91251View commit details
Commits on Nov 28, 2024
-
CANN: ROPE operator optimization (ggerganov#10540)
* [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b742013 - Browse repository at this point
Copy the full SHA b742013View commit details -
CANN: Fix SOC_TYPE compile bug (ggerganov#10519)
* CANN: Fix the bug build fail on Ascend310P under two cases: 1) Manual specify SOC_TYPE 2) Under some unusual compile environment * Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU. * fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
Configuration menu - View commit details
-
Copy full SHA for 605fa66 - Browse repository at this point
Copy the full SHA 605fa66View commit details -
Configuration menu - View commit details
-
Copy full SHA for c6bc739 - Browse repository at this point
Copy the full SHA c6bc739View commit details -
kompute : improve backend to pass test_backend_ops (ggerganov#10542)
* kompute: op_unary: reject unsupported parameters Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: softmax: implement ALiBi support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: rope: implement neox and phi3 support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q4_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_f16 permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q6_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> --------- Signed-off-by: Sergio Lopez <slp@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 2025fa6 - Browse repository at this point
Copy the full SHA 2025fa6View commit details -
ggml-cpu: support IQ4_NL_4_4 by runtime repack (ggerganov#10541)
* ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard
Configuration menu - View commit details
-
Copy full SHA for c202cef - Browse repository at this point
Copy the full SHA c202cefView commit details -
Configuration menu - View commit details
-
Copy full SHA for eea986f - Browse repository at this point
Copy the full SHA eea986fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 76b27d2 - Browse repository at this point
Copy the full SHA 76b27d2View commit details -
Configuration menu - View commit details
-
Copy full SHA for e90688e - Browse repository at this point
Copy the full SHA e90688eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7281cf1 - Browse repository at this point
Copy the full SHA 7281cf1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8907193 - Browse repository at this point
Copy the full SHA 8907193View commit details -
server : (tests) don't use thread for capturing stdout/stderr, bump o…
…penai client library (ggerganov#10568) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3
Configuration menu - View commit details
-
Copy full SHA for 6c59567 - Browse repository at this point
Copy the full SHA 6c59567View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4c0a95b - Browse repository at this point
Copy the full SHA 4c0a95bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dc22344 - Browse repository at this point
Copy the full SHA dc22344View commit details
Commits on Nov 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 678d799 - Browse repository at this point
Copy the full SHA 678d799View commit details -
vulkan: get the first command buffer submitted sooner (ggerganov#10499)
This is an incremental improvement over ggerganov#9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.
Configuration menu - View commit details
-
Copy full SHA for f095a64 - Browse repository at this point
Copy the full SHA f095a64View commit details -
CANN: RoPE operator optimization (ggerganov#10563)
* [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 938f608 - Browse repository at this point
Copy the full SHA 938f608View commit details -
sycl : Reroute permuted mul_mats through oneMKL (ggerganov#10408)
This PR fixes the failing MUL_MAT tests for the sycl backend.
Configuration menu - View commit details
-
Copy full SHA for 266b851 - Browse repository at this point
Copy the full SHA 266b851View commit details