Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #47

Merged
merged 19 commits into from
Nov 29, 2024
Merged

merge upstream #47

merged 19 commits into from
Nov 29, 2024

Commits on Nov 27, 2024

  1. Add some minimal optimizations for CDNA (ggerganov#10498)

    * Add some minimal optimizations for CDNA
    
    * ggml_cuda: set launch bounds also for GCN as it helps there too
    IMbackK authored Nov 27, 2024
    Configuration menu
    Copy the full SHA
    3ad5451 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9f91251 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2024

  1. CANN: ROPE operator optimization (ggerganov#10540)

    * [cann] ROPE operator optimization
    
    Co-authored-by: noemotiovon <noemotiovon@gmail.com>
    noemotiovon and noemotiovon authored Nov 28, 2024
    Configuration menu
    Copy the full SHA
    b742013 View commit details
    Browse the repository at this point in the history
  2. CANN: Fix SOC_TYPE compile bug (ggerganov#10519)

    * CANN: Fix the bug build fail on Ascend310P under two cases:
    1) Manual specify SOC_TYPE
    2) Under some unusual compile environment
    
    * Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
    
    * fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
    leo-pony authored Nov 28, 2024
    Configuration menu
    Copy the full SHA
    605fa66 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c6bc739 View commit details
    Browse the repository at this point in the history
  4. kompute : improve backend to pass test_backend_ops (ggerganov#10542)

    * kompute: op_unary: reject unsupported parameters
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    * kompute: softmax: implement ALiBi support
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    * kompute: rope: implement neox and phi3 support
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    * kompute: op_mul_mat_q4_k permutted support
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    * kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    * kompute: op_mul_mat_f16 permutted support
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    * kompute: op_mul_mat_q6_k permutted support
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    
    ---------
    
    Signed-off-by: Sergio Lopez <slp@redhat.com>
    slp authored Nov 28, 2024
    Configuration menu
    Copy the full SHA
    2025fa6 View commit details
    Browse the repository at this point in the history
  5. ggml-cpu: support IQ4_NL_4_4 by runtime repack (ggerganov#10541)

    * ggml-cpu: support IQ4_NL_4_4 by runtime repack
    
    * ggml-cpu: add __ARM_FEATURE_DOTPROD guard
    FanShupei authored Nov 28, 2024
    Configuration menu
    Copy the full SHA
    c202cef View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    eea986f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    76b27d2 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e90688e View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7281cf1 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    8907193 View commit details
    Browse the repository at this point in the history
  11. server : (tests) don't use thread for capturing stdout/stderr, bump o…

    …penai client library (ggerganov#10568)
    
    * server : (tests) don't use thread for capturing stdout/stderr
    
    * test: bump openai to 1.55.2
    
    * bump openai to 1.55.3
    ngxson authored Nov 28, 2024
    Configuration menu
    Copy the full SHA
    6c59567 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    4c0a95b View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    dc22344 View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2024

  1. Configuration menu
    Copy the full SHA
    678d799 View commit details
    Browse the repository at this point in the history
  2. vulkan: get the first command buffer submitted sooner (ggerganov#10499)

    This is an incremental improvement over ggerganov#9118 to get work to the GPU a bit
    sooner. The first part is to start with a smaller number of nodes before
    the first submit, and ramp it up to the current 100 nodes/submit. The
    second part is to reduce the dryrun overhead for all the nodes that just
    need to request descriptor space.
    
    With these changes I get around 1-2% speedup on RTX 4070 combined with my
    old Haswell-era CPU.
    jeffbolznv authored Nov 29, 2024
    Configuration menu
    Copy the full SHA
    f095a64 View commit details
    Browse the repository at this point in the history
  3. CANN: RoPE operator optimization (ggerganov#10563)

    * [cann] RoPE operator optimization
    
    * [CANN]Code Formatting
    
    ---------
    
    Co-authored-by: noemotiovon <noemotiovon@gmail.com>
    noemotiovon and noemotiovon authored Nov 29, 2024
    Configuration menu
    Copy the full SHA
    938f608 View commit details
    Browse the repository at this point in the history
  4. sycl : Reroute permuted mul_mats through oneMKL (ggerganov#10408)

    This PR fixes the failing MUL_MAT tests for the sycl backend.
    Alcpz authored Nov 29, 2024
    Configuration menu
    Copy the full SHA
    266b851 View commit details
    Browse the repository at this point in the history