Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : ggml #2608

Merged
merged 65 commits into from
Dec 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
62fd128
ggml-opt: fix data corruption (ggml/1022)
JohannesGaessler Nov 20, 2024
3445025
Do not include arm_neon.h when compiling CUDA code (ggml/1028)
frankier Nov 26, 2024
2a2ed50
metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026)
PABannier Nov 28, 2024
2b86e59
feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)
PABannier Dec 2, 2024
2404fca
CUDA: remove unnecessary warp reduce in FA (ggml/1032)
mahorozte Dec 3, 2024
b1c6a66
add cmake rvv support (llama/10411)
lhpqaq Nov 19, 2024
133fb12
vulkan: further optimize mul_mat_vec using larger loads (llama/10387)
jeffbolznv Nov 20, 2024
b4fa978
vulkan: copy iq4_nl LUT into shared memory (llama/10409)
jeffbolznv Nov 20, 2024
176d689
vulkan: predicate max operation in soft_max shaders/soft_max (llama/1…
jeffbolznv Nov 20, 2024
cbfbf5f
cuda : optimize argmax (llama/10441)
slaren Nov 21, 2024
761a3e8
CANN: Support Ascend310P to accelerate F32 and F16 Model (llama/10216)
leo-pony Nov 22, 2024
cd3456d
ggml : do not use ARM features not included in the build (llama/10457)
slaren Nov 23, 2024
5be17d6
metal : minor code formatting
ggerganov Nov 25, 2024
920a48a
ggml : add support for dynamic loading of backends (llama/10469)
slaren Nov 25, 2024
640732f
llama : accept a list of devices to use to offload a model (llama/10497)
slaren Nov 25, 2024
9229dd6
metal : enable mat-vec kernels for bs <= 4 (llama/10491)
ggerganov Nov 25, 2024
8eb23b9
vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484)
sparkleholic Nov 26, 2024
c211ba4
CANN: RoPE and CANCAT operator optimization (llama/10488)
noemotiovon Nov 26, 2024
a7103bb
CANN: Improve the Inferencing Performance for Ascend NPU Device (llam…
shen-shanshan Nov 26, 2024
b054b83
ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487)
chaxu01 Nov 26, 2024
a116add
cmake : enable warnings in llama (llama/10474)
ggerganov Nov 26, 2024
1e5fe6a
vulkan: fix group_norm (llama/10496)
jeffbolznv Nov 26, 2024
5ec2241
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (…
yeahdongcn Nov 26, 2024
6ebd263
vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459)
jeffbolznv Nov 27, 2024
f69379f
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/1…
jeffbolznv Nov 27, 2024
475517a
vulkan: further optimize q5_k mul_mat_vec (llama/10479)
jeffbolznv Nov 27, 2024
6ecca7d
vulkan: Handle GPUs with less shared memory (llama/10468)
jeffbolznv Nov 27, 2024
5aad67d
vulkan: define all quant data structures in types.comp (llama/10440)
jeffbolznv Nov 27, 2024
98690a8
metal : fix group_norm support condition (llama/0)
ggerganov Nov 27, 2024
d147926
Add some minimal optimizations for CDNA (llama/10498)
IMbackK Nov 27, 2024
bb85fcc
CANN: ROPE operator optimization (llama/10540)
noemotiovon Nov 28, 2024
3536fd2
CANN: Fix SOC_TYPE compile bug (llama/10519)
leo-pony Nov 28, 2024
68e48d3
kompute : improve backend to pass test_backend_ops (llama/10542)
slp Nov 28, 2024
23468e6
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
FanShupei Nov 28, 2024
0712712
cmake : fix ARM feature detection (llama/10543)
ggerganov Nov 28, 2024
decea57
ggml : fix row condition for i8mm kernels (llama/10561)
ggerganov Nov 28, 2024
b8a6761
ggml : remove redundant copyright notice + update authors
ggerganov Nov 28, 2024
febda2f
vulkan: get the first command buffer submitted sooner (llama/10499)
jeffbolznv Nov 29, 2024
29349f1
CANN: RoPE operator optimization (llama/10563)
noemotiovon Nov 29, 2024
957e21e
sycl : Reroute permuted mul_mats through oneMKL (llama/10408)
Alcpz Nov 29, 2024
5acee88
sycl : offload of get_rows set to 0 (llama/10432)
Alcpz Nov 29, 2024
544a4d4
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580)
FanShupei Nov 29, 2024
8383be9
ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562)
ggerganov Nov 29, 2024
621659c
vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536)
netrunnereve Nov 30, 2024
06bf264
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
angt Nov 30, 2024
59853e7
SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579)
qnixsynapse Dec 2, 2024
b405683
metal : small-batch mat-mul kernels (llama/10581)
ggerganov Dec 3, 2024
b383af9
ggml : move AMX to the CPU backend (llama/10570)
slaren Dec 3, 2024
76199ee
common : fix compile warning
ggerganov Dec 3, 2024
fe9b27d
files : remove make artifacts
ggerganov Dec 3, 2024
40d5987
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)
PABannier Dec 3, 2024
e20efac
ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037)
PABannier Dec 4, 2024
03331b1
vulkan: optimize and reenable split_k (llama/10637)
jeffbolznv Dec 3, 2024
9623ba1
Avoid using __fp16 on ARM with old nvcc (llama/10616)
frankier Dec 4, 2024
3085e28
SYCL : Move to compile time oneMKL interface backend selection for NV…
s-Nick Dec 4, 2024
b311da3
vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (…
jeffbolznv Dec 4, 2024
61aff48
ggml-cpu : fix HWCAP2_I8MM value (llama/10646)
slaren Dec 4, 2024
dfddca0
ggml : add predefined list of CPU backend variants to build (llama/10…
slaren Dec 4, 2024
dfe6652
sync : ggml
ggerganov Dec 5, 2024
1a1fcd3
talk-llama : sync llama.cpp
ggerganov Dec 5, 2024
729effe
make : shim cmake
ggerganov Dec 6, 2024
a5cd03a
ci : disable Obj-C build + fixes
ggerganov Dec 8, 2024
762f63e
ci : disable CUDA and Android builds
ggerganov Dec 8, 2024
668930a
readme : update build instructions
ggerganov Dec 8, 2024
280d273
ci : disable freeBSD builds [no ci]
ggerganov Dec 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
287 changes: 149 additions & 138 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ jobs:
-w /workspace ${{ env.ubuntu_image }} /bin/sh -c '
set -e
apt update
apt install -y build-essential libsdl2-dev
make
make stream'
apt install -y build-essential libsdl2-dev cmake
cmake -B build
cmake --build build --config Release -j $(nproc)'

macOS-latest:
runs-on: macOS-latest
Expand All @@ -42,30 +42,30 @@ jobs:
- name: Dependencies
run: |
brew update
brew install sdl2
brew install sdl2 cmake

- name: Build
run: |
make
make stream
cmake -B build
cmake --build build --config Release

freeBSD-latest:
runs-on: macos-12

steps:
- name: Clone
uses: actions/checkout@v4

- name: Build
uses: cross-platform-actions/action@v0.24.0
with:
operating_system: freebsd
version: '13.3'
run: |
sudo pkg update
sudo pkg install -y gmake sdl2
gmake
gmake stream
# freeBSD-latest:
# runs-on: macos-12
#
# steps:
# - name: Clone
# uses: actions/checkout@v4
#
# - name: Build
# uses: cross-platform-actions/action@v0.24.0
# with:
# operating_system: freebsd
# version: '13.3'
# run: |
# sudo pkg update
# sudo pkg install -y gmake sdl2 cmake
# cmake -B build
# cmake --build build --config Release

ubuntu-latest-gcc:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -280,21 +280,6 @@ jobs:
mingw-w64-${{matrix.env}}-SDL2
mingw-w64-${{matrix.env}}-openblas

- name: Build using make
shell: msys2 {0}
run: |
make -j $(nproc)

- name: Clean after building using make
shell: msys2 {0}
run: |
make clean

- name: Build using make w/ OpenBLAS
shell: msys2 {0}
run: |
make GGML_OPENBLAS=1 -j $(nproc)

- name: Build using CMake
shell: msys2 {0}
run: |
Expand Down Expand Up @@ -445,71 +430,72 @@ jobs:
name: whisper-blas-bin-${{ matrix.arch }}
path: build/bin/${{ matrix.build }}

windows-cublas:
runs-on: windows-2019

strategy:
matrix:
build: [Release]
arch: [x64]
cublas: [ON]
sdl2: [ON]
cuda-toolkit: [12.2.0, 11.8.0]
include:
- arch: x64
s2arc: x64
- sdl2: ON
s2ver: 2.28.5

steps:
- name: Clone
uses: actions/checkout@v4

- name: Add msbuild to PATH
uses: microsoft/setup-msbuild@v2

- name: Install CUDA Toolkit
id: cuda-toolkit
uses: Jimver/cuda-toolkit@v0.2.15
with:
cuda: '${{ matrix.cuda-toolkit }}'

- name: Fetch SDL2 and set SDL2_DIR
if: matrix.sdl2 == 'ON'
run: |
C:/msys64/usr/bin/wget.exe -qO sdl2.zip https://github.com/libsdl-org/SDL/releases/download/release-${{ matrix.s2ver }}/SDL2-devel-${{ matrix.s2ver }}-VC.zip
7z x sdl2.zip
echo "SDL2_DIR=$env:GITHUB_WORKSPACE/SDL2-${{ matrix.s2ver }}/cmake" >> $env:GITHUB_ENV

- name: Configure
run: >
cmake -S . -B ./build -A ${{ matrix.arch }}
-DCMAKE_BUILD_TYPE=${{ matrix.build }}
-DGGML_CUDA=${{ matrix.cublas }}
-DWHISPER_SDL2=${{ matrix.sdl2 }}

- name: Build ${{ matrix.cuda-toolkit }}
run: |
cd ./build
cmake --build . --config ${{ matrix.build }}

- name: Copy CUDA DLLs
run: >
Copy-Item -PassThru
-Path "${{ steps.cuda-toolkit.outputs.CUDA_PATH }}/bin/*.dll"
-Include cudart64_*,cublas64_*,cublasLt64_*
-Destination build/bin/${{ matrix.build }}

- name: Copy SDL2.dll
if: matrix.sdl2 == 'ON'
run: copy "$env:SDL2_DIR/../lib/${{ matrix.s2arc }}/SDL2.dll" build/bin/${{ matrix.build }}

- name: Upload binaries
if: matrix.sdl2 == 'ON'
uses: actions/upload-artifact@v4
with:
name: whisper-cublas-${{ matrix.cuda-toolkit }}-bin-${{ matrix.arch }}
path: build/bin/${{ matrix.build }}
# TODO: fix and re-enable
# windows-cublas:
# runs-on: windows-2019
#
# strategy:
# matrix:
# build: [Release]
# arch: [x64]
# cublas: [ON]
# sdl2: [ON]
# cuda-toolkit: [12.2.0, 11.8.0]
# include:
# - arch: x64
# s2arc: x64
# - sdl2: ON
# s2ver: 2.28.5
#
# steps:
# - name: Clone
# uses: actions/checkout@v4
#
# - name: Add msbuild to PATH
# uses: microsoft/setup-msbuild@v2
#
# - name: Install CUDA Toolkit
# id: cuda-toolkit
# uses: Jimver/cuda-toolkit@v0.2.15
# with:
# cuda: '${{ matrix.cuda-toolkit }}'
#
# - name: Fetch SDL2 and set SDL2_DIR
# if: matrix.sdl2 == 'ON'
# run: |
# C:/msys64/usr/bin/wget.exe -qO sdl2.zip https://github.com/libsdl-org/SDL/releases/download/release-${{ matrix.s2ver }}/SDL2-devel-${{ matrix.s2ver }}-VC.zip
# 7z x sdl2.zip
# echo "SDL2_DIR=$env:GITHUB_WORKSPACE/SDL2-${{ matrix.s2ver }}/cmake" >> $env:GITHUB_ENV
#
# - name: Configure
# run: >
# cmake -S . -B ./build -A ${{ matrix.arch }}
# -DCMAKE_BUILD_TYPE=${{ matrix.build }}
# -DGGML_CUDA=${{ matrix.cublas }}
# -DWHISPER_SDL2=${{ matrix.sdl2 }}
#
# - name: Build ${{ matrix.cuda-toolkit }}
# run: |
# cd ./build
# cmake --build . --config ${{ matrix.build }}
#
# - name: Copy CUDA DLLs
# run: >
# Copy-Item -PassThru
# -Path "${{ steps.cuda-toolkit.outputs.CUDA_PATH }}/bin/*.dll"
# -Include cudart64_*,cublas64_*,cublasLt64_*
# -Destination build/bin/${{ matrix.build }}
#
# - name: Copy SDL2.dll
# if: matrix.sdl2 == 'ON'
# run: copy "$env:SDL2_DIR/../lib/${{ matrix.s2arc }}/SDL2.dll" build/bin/${{ matrix.build }}
#
# - name: Upload binaries
# if: matrix.sdl2 == 'ON'
# uses: actions/upload-artifact@v4
# with:
# name: whisper-cublas-${{ matrix.cuda-toolkit }}-bin-${{ matrix.arch }}
# path: build/bin/${{ matrix.build }}

emscripten:
runs-on: ubuntu-latest
Expand All @@ -533,56 +519,80 @@ jobs:
emcmake cmake . -DCMAKE_BUILD_TYPE=${{ matrix.build }}
make

ios:
ios-xcode-build:
runs-on: macos-latest

strategy:
matrix:
build: [Release]

steps:
- name: Clone
- name: Checkout code
uses: actions/checkout@v4

- name: Configure
run: |
cp models/for-tests-ggml-base.en.bin models/ggml-base.en.bin
mkdir models/ggml-base.en-encoder.mlmodelc

- name: Build objc example
run: xcodebuild -project examples/whisper.objc/whisper.objc.xcodeproj -scheme whisper.objc -configuration ${{ matrix.build }} -sdk iphonesimulator build

- name: Build swiftui example
run: xcodebuild -project examples/whisper.swiftui/whisper.swiftui.xcodeproj -scheme WhisperCppDemo -configuration ${{ matrix.build }} -sdk iphonesimulator build

android:
runs-on: ubuntu-latest

steps:
- name: Clone
uses: actions/checkout@v4
with:
path: whisper

- name: Install Java
uses: actions/setup-java@v4
with:
distribution: zulu
java-version: 21

- name: Setup Android SDK
uses: android-actions/setup-android@v3

- name: Build
id: cmake_build
run: |
cd whisper/examples/whisper.android
./gradlew assembleRelease --no-daemon
sysctl -a
mkdir build
cd build
cmake -G Xcode .. \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DWHISPER_BUILD_EXAMPLES=OFF \
-DWHISPER_BUILD_TESTS=OFF \
-DWHISPER_BUILD_SERVER=OFF \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_DEPLOYMENT_TARGET=14.0 \
-DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu) -- CODE_SIGNING_ALLOWED=NO
sudo cmake --install . --config Release

- name: xcodebuild for swift package
id: xcodebuild
run: |
xcodebuild -scheme whisper-Package -destination 'generic/platform=iOS'

#- name: Build objc example
# run: xcodebuild -project examples/whisper.objc/whisper.objc.xcodeproj -scheme whisper.objc -configuration ${{ matrix.build }} -sdk iphoneos build

- name: Build with external ggml
run: |
export PATH_TO_GGML=$PWD/ggml
cd whisper/examples/whisper.android
./gradlew assembleRelease --no-daemon
- name: Build swiftui example
run: xcodebuild -project examples/whisper.swiftui/whisper.swiftui.xcodeproj -scheme WhisperCppDemo -configuration ${{ matrix.build }} -sdk iphoneos CODE_SIGNING_REQUIRED=NO CODE_SIGN_IDENTITY= -destination 'generic/platform=iOS' build

# TODO: update android build and re-enable when it works
# android:
# runs-on: ubuntu-latest
#
# steps:
# - name: Clone
# uses: actions/checkout@v4
# with:
# path: whisper
#
# - name: Install Java
# uses: actions/setup-java@v4
# with:
# distribution: zulu
# java-version: 21
#
# - name: Setup Android SDK
# uses: android-actions/setup-android@v3
#
# - name: Build
# run: |
# cd whisper/examples/whisper.android
# ./gradlew assembleRelease --no-daemon
#
# - name: Build with external ggml
# run: |
# export PATH_TO_GGML=$PWD/ggml
# cd whisper/examples/whisper.android
# ./gradlew assembleRelease --no-daemon

# TODO: disable because of following fail: https://github.com/ggerganov/whisper.cpp/actions/runs/11019444420/job/30627193602
# android_java:
Expand Down Expand Up @@ -664,5 +674,6 @@ jobs:
- name: Test quantize
run: |
./models/download-ggml-model.sh tiny.en
make quantize
./quantize models/ggml-tiny.en.bin models/ggml-tiny.en-q4_0.bin q4_0
cmake -B build
cmake --build build --config Release
./build/bin/quantize models/ggml-tiny.en.bin models/ggml-tiny.en-q4_0.bin q4_0
Loading