Skip to content

Releases: ggerganov/whisper.cpp

v1.7.1

07 Oct 10:09
ebca09a
Compare
Choose a tag to compare

Overview

  • Fix Vulkan crashes
  • Performance stats for Vulkan on RTX 2060
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 VULKAN tiny 1 0 30.38 1.37 1.04 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_0 1 0 20.98 1.38 0.99 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_1 1 0 20.74 1.30 0.96 0.05 9f346d0
RTX 2060 VULKAN base 1 0 44.69 1.59 1.78 0.09 9f346d0
RTX 2060 VULKAN base-q5_0 1 0 39.72 2.11 1.72 0.08 9f346d0
RTX 2060 VULKAN base-q5_1 1 0 39.45 2.01 1.63 0.08 9f346d0
RTX 2060 VULKAN small 1 0 160.02 3.53 4.64 0.23 9f346d0
RTX 2060 VULKAN small-q5_0 1 0 141.52 4.54 4.44 0.20 9f346d0
RTX 2060 VULKAN small-q5_1 1 0 141.03 4.63 4.18 0.20 9f346d0
RTX 2060 VULKAN medium 1 0 472.66 7.55 11.35 0.56 9f346d0
RTX 2060 VULKAN medium-q5_0 1 0 395.55 9.81 10.64 0.49 9f346d0
RTX 2060 VULKAN medium-q5_1 1 0 398.85 10.16 10.15 0.50 9f346d0
RTX 2060 VULKAN medium-dis 1 0 427.26 1.26 1.20 0.08 9f346d0
RTX 2060 VULKAN large-v2 1 0 924.60 12.36 18.56 1.01 9f346d0
RTX 2060 VULKAN large-v2-q5_0 1 0 774.21 17.25 17.17 0.85 9f346d0
RTX 2060 VULKAN large-v2-q5_1 1 0 779.75 17.44 16.27 0.85 9f346d0
RTX 2060 VULKAN large-v2-dis 1 0 833.35 1.38 1.56 0.10 9f346d0
RTX 2060 VULKAN large-v3-turbo 1 0 839.90 2.11 2.70 0.16 9f346d0
RTX 2060 VULKAN large-v3-turbo-q5_0 1 0 705.49 3.22 2.53 0.14 9f346d0

What's Changed

New Contributors

Full Changelog: v1.7.0...v1.7.1

v1.7.0

05 Oct 14:15
6a94163
Compare
Choose a tag to compare

Overview

  • Fix crashes with high number of beams
  • Reduce overal VRAM usage
  • Optimize Encoder performance

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 8.37 1.44 0.48 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 1 9.81 1.46 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 1 8.80 1.47 0.50 0.01 6a94163
M2 Ultra METAL base 1 1 16.11 1.96 0.74 0.02 6a94163
M2 Ultra METAL base-q5_0 1 1 16.38 1.99 0.78 0.02 6a94163
M2 Ultra METAL base-q5_1 1 1 16.72 2.00 0.77 0.02 6a94163
M2 Ultra METAL small 1 1 41.26 3.88 1.66 0.05 6a94163
M2 Ultra METAL small-q5_0 1 1 46.91 4.02 1.76 0.06 6a94163
M2 Ultra METAL small-q5_1 1 1 47.05 4.00 1.73 0.06 6a94163
M2 Ultra METAL medium 1 1 111.29 7.79 3.63 0.11 6a94163
M2 Ultra METAL medium-q5_0 1 1 129.78 7.71 3.85 0.13 6a94163
M2 Ultra METAL medium-q5_1 1 1 129.29 7.71 3.87 0.13 6a94163
M2 Ultra METAL medium-dis 1 1 99.27 1.09 0.43 0.02 6a94163
M2 Ultra METAL large-v2 1 1 198.81 11.54 5.59 0.20 6a94163
M2 Ultra METAL large-v2-q5_0 1 1 236.18 11.12 6.11 0.24 6a94163
M2 Ultra METAL large-v2-q5_1 1 1 235.88 11.14 6.01 0.24 6a94163
M2 Ultra METAL large-v2-dis 1 1 177.41 1.21 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 1 178.92 1.89 0.83 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 1 211.44 1.73 0.90 0.04 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 0 10.04 1.37 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 0 10.02 1.36 0.53 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 0 11.08 1.37 0.53 0.01 6a94163
M2 Ultra METAL base 1 0 17.84 1.93 0.77 0.02 6a94163
M2 Ultra METAL base-q5_0 1 0 18.57 1.92 0.81 0.02 6a94163
M2 Ultra METAL base-q5_1 1 0 18.66 1.93 0.82 0.02 6a94163
M2 Ultra METAL small 1 0 48.26 3.95 1.73 0.05 6a94163
M2 Ultra METAL small-q5_0 1 0 53.68 3.99 1.85 0.06 6a94163
M2 Ultra METAL small-q5_1 1 0 53.86 4.00 1.82 0.06 6a94163
M2 Ultra METAL medium 1 0 130.09 8.01 3.82 0.13 6a94163
M2 Ultra METAL medium-q5_0 1 0 148.18 7.92 4.11 0.14 6a94163
M2 Ultra METAL medium-q5_1 1 0 147.95 7.94 4.11 0.14 6a94163
M2 Ultra METAL medium-dis 1 0 116.97 1.11 0.42 0.02 6a94163
M2 Ultra METAL large-v2 1 0 232.43 12.34 5.87 0.22 6a94163
M2 Ultra METAL large-v2-q5_0 1 0 269.72 11.68 6.44 0.26 6a94163
M2 Ultra METAL large-v2-q5_1 1 0 269.71 11.82 6.36 0.26 6a94163
M2 Ultra METAL large-v2-dis 1 0 209.25 1.25 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 0 211.09 1.98 0.84 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 0 244.23 1.81 0.92 0.04 6a94163

Ryzen 9 5950X + RTX 2060

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 1 7.35 0.78 0.24 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 1 6.45 0.67 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 1 6.39 0.66 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA base 1 1 10.20 0.88 0.30 0.01 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 1 11.38 0.92 0.21 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 1 11.76 0.91 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 1 33.06 2.00 0.56 0.03 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 1 35.84 1.84 0.43 0.04 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 1 36.89 1.82 0.42 0.04 6a94163
RTX 2060 AVX2 CUDA medium 1 1 90.65 4.54 1.13 0.08 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 1 104.01 3.80 0.91 0.10 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 1 107.98 3.72 0.87 0.10 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 1 79.08 0.68 0.17 0.01 6a94163
RTX 2060 AVX2 CUDA large-v2 1 1 162.00 7.52 1.92 0.14 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 1 184.59 5.64 1.50 0.16 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 1 193.85 5.55 1.44 0.17 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 1 140.75 0.84 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 1 143.38 1.29 0.36 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 1 163.30 0.93 0.28 0.03 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 0 12.49 0.87 0.23 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 0 10.65 0.78 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 0 10.82 0.77 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA base 1 0 18.97 1.04 0.34 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 0 20.22 1.09 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 0 20.48 1.07 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 0 59.52 2.37 0.70 0.05 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 0 62.98 2.23 0.60 0.06 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 0 63.64 2.21 0.59 0.06 6a94163
RTX 2060 AVX2 CUDA medium 1 0 161.53 5.36 1.53 0.13 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 0 174.96 4.64 1.32 0.15 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 0 178.42 4.57 1.29 0.15 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 0 149.65 0.75 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA large-v2 1 0 280.55 8.74 2.51 0.23 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 0 306.87 6.92 2.08 0.25 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 0 314.25 6.82 2.02 0.26 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 0 259.39 0.91 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 0 261.83 1.44 0.41 0.04 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 0 282.99 1.09 0.33 0.04 6a94163

Vulkan:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 VULKAN tiny 1 0 30.38 1.37 1.04 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_0 1 0 20.98 1.38 0.99 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_1 1 0 20.74 1.30 0.96 0.05 9f346d0
RTX 2060 VULKAN base 1 0 44.69 1.59 1.78 0.09 9f346d0
RTX 2060 VULKAN base-q5_0 1 0 39.72 2.11 1.72 0.08 9f346d0
RTX 2060 VULKAN base-q5_1 1 0 39.45 2.01 1.63 0.08 9f346d0
RTX 2060 VULKAN small 1 0 160.02 3.53 4.64 0.23 9f346d0
RTX 2060 VULKAN small-q5_0 1 0 141.52 4.54 4.44 0.20 9f346d0
RTX 2060 VULKA...
Read more

v1.6.2

27 May 07:36
c7b6988
Compare
Choose a tag to compare

Overview

Bugfix when using multiple whisper_state in parallel: #2182

What's Changed

New Contributors

Full Changelog: v1.6.1...v1.6.2

v1.6.1

21 May 15:46
c10db6e
Compare
Choose a tag to compare

Minor release adding initial ffmpeg support in the examples #2133 (thx @WilliamTambellini)

What's Changed

New Contributors

Full Changelog: v1.6.0...v1.6.1

v1.6.0

15 May 07:13
08981d1
Compare
Choose a tag to compare

Overview

  • Can optionally enable Flash Attention for faster processing on CUDA and Metal devices (#2152)
  • Faster ppc64 performance (40aeeee) (not tested)
  • Fix main slowdown bug (#2070)

Shoutout to @JohannesGaessler for contributing efficient FA CUDA kernels

Some performance numbers for this release:

M1 Pro

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M1 Pro METAL tiny 1 0 39.21 1.74 0.61 0.04 22c96b4
M1 Pro METAL base 1 0 70.76 2.60 0.93 0.06 22c96b4
M1 Pro METAL small 1 0 217.28 6.42 2.14 0.17 22c96b4
M1 Pro METAL medium 1 0 596.74 14.43 4.75 0.45 22c96b4
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M1 Pro METAL tiny 1 1 30.77 1.59 0.54 0.03 22c96b4
M1 Pro METAL base 1 1 60.42 2.29 0.81 0.05 22c96b4
M1 Pro METAL small 1 1 183.82 5.12 1.81 0.14 22c96b4
M1 Pro METAL medium 1 1 517.92 11.60 4.01 0.38 22c96b4

M2 Ultra

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 0 12.32 1.35 0.49 0.01 22c96b4
M2 ULTRA METAL tiny-q5_0 1 0 11.65 1.30 0.51 0.01 22c96b4
M2 ULTRA METAL tiny-q5_1 1 0 12.08 1.30 0.51 0.01 22c96b4
M2 ULTRA METAL base 1 0 17.58 1.90 0.76 0.02 22c96b4
M2 ULTRA METAL base-q5_0 1 0 18.89 1.86 0.79 0.02 22c96b4
M2 ULTRA METAL base-q5_1 1 0 20.69 1.88 0.79 0.02 22c96b4
M2 ULTRA METAL small 1 0 49.32 3.85 1.71 0.05 22c96b4
M2 ULTRA METAL small-q5_0 1 0 54.91 3.81 1.82 0.06 22c96b4
M2 ULTRA METAL small-q5_1 1 0 54.92 3.81 1.79 0.06 22c96b4
M2 ULTRA METAL medium 1 0 134.34 8.04 3.82 0.13 22c96b4
M2 ULTRA METAL medium-q5_0 1 0 151.68 7.59 4.07 0.14 22c96b4
M2 ULTRA METAL medium-q5_1 1 0 151.58 7.67 4.07 0.14 22c96b4
M2 ULTRA METAL medium-dis 1 0 120.82 1.07 0.41 0.02 22c96b4
M2 ULTRA METAL large-v2 1 0 235.63 12.27 5.85 0.22 22c96b4
M2 ULTRA METAL large-v2-q5_0 1 0 273.38 11.17 6.40 0.26 22c96b4
M2 ULTRA METAL large-v2-q5_1 1 0 272.44 11.32 6.29 0.26 22c96b4
M2 ULTRA METAL large-v2-dis 1 0 212.51 1.20 0.47 0.02 22c96b4
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 1 9.07 1.33 0.45 0.01 22c96b4
M2 ULTRA METAL tiny-q5_0 1 1 9.74 1.33 0.47 0.01 22c96b4
M2 ULTRA METAL tiny-q5_1 1 1 8.93 1.31 0.46 0.01 22c96b4
M2 ULTRA METAL base 1 1 15.75 1.87 0.71 0.02 22c96b4
M2 ULTRA METAL base-q5_0 1 1 17.04 1.83 0.74 0.02 22c96b4
M2 ULTRA METAL base-q5_1 1 1 17.17 1.83 0.74 0.02 22c96b4
M2 ULTRA METAL small 1 1 42.33 3.64 1.60 0.05 22c96b4
M2 ULTRA METAL small-q5_0 1 1 47.61 3.63 1.70 0.05 22c96b4
M2 ULTRA METAL small-q5_1 1 1 47.70 3.66 1.68 0.05 22c96b4
M2 ULTRA METAL medium 1 1 114.42 7.53 3.55 0.11 22c96b4
M2 ULTRA METAL medium-q5_0 1 1 132.63 7.02 3.77 0.13 22c96b4
M2 ULTRA METAL medium-q5_1 1 1 132.28 7.10 3.76 0.13 22c96b4
M2 ULTRA METAL medium-dis 1 1 102.34 1.01 0.42 0.01 22c96b4
M2 ULTRA METAL large-v2 1 1 203.01 11.03 5.45 0.20 22c96b4
M2 ULTRA METAL large-v2-q5_0 1 1 240.05 10.18 5.98 0.23 22c96b4
M2 ULTRA METAL large-v2-q5_1 1 1 239.22 10.23 5.87 0.23 22c96b4
M2 ULTRA METAL large-v2-dis 1 1 181.14 1.14 0.48 0.02 22c96b4

Ryzen 9 5950X + RTX 2060

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
Ryzen 9 5950X AVX2 tiny 8 0 195.29 1.57 0.51 0.26 22c96b4
Ryzen 9 5950X AVX2 tiny-q5_0 8 0 213.33 1.10 0.50 0.30 22c96b4
Ryzen 9 5950X AVX2 tiny-q5_1 8 0 219.38 1.18 0.53 0.32 22c96b4
Ryzen 9 5950X AVX2 base 8 0 424.85 3.71 1.03 0.46 22c96b4
Ryzen 9 5950X AVX2 base-q5_0 8 0 473.61 1.81 0.82 0.52 22c96b4
Ryzen 9 5950X AVX2 base-q5_1 8 0 484.14 1.92 0.85 0.56 22c96b4
Ryzen 9 5950X AVX2 small 8 0 1458.32 12.66 3.09 1.26 22c96b4
Ryzen 9 5950X AVX2 small-q5_0 8 0 1673.22 6.42 2.18 1.45 22c96b4
Ryzen 9 5950X AVX2 small-q5_1 8 0 1724.78 6.72 2.32 1.52 22c96b4
Ryzen 9 5950X AVX2 medium 8 0 4333.87 36.80 8.56 3.37 22c96b4
Ryzen 9 5950X AVX2 medium-q5_0 8 0 5194.09 19.21 5.71 3.97 22c96b4
Ryzen 9 5950X AVX2 medium-q5_1 8 0 5450.39 20.01 5.99 4.17 22c96b4
Ryzen 9 5950X AVX2 medium-dis 8 0 3995.19 5.08 1.21 0.55 22c96b4
Ryzen 9 5950X AVX2 large-v2 8 0 8056.16 69.74 16.11 6.13 22c96b4
Ryzen 9 5950X AVX2 large-v2-q5_0 8 0 9799.58 35.16 10.49 7.28 22c96b4
Ryzen 9 5950X AVX2 large-v2-q5_1 8 0 ms 36.74 11.02 7.65 22c96b4
Ryzen 9 5950X AVX2 large-v2-dis 8 0 7490.03 7.40 1.70 0.72 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 8 0 12.54 0.93 0.29 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_0 8 0 12.73 0.98 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_1 8 0 12.72 0.99 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA base 8 0 24.14 1.28 0.41 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_0 8 0 24.58 1.38 0.35 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_1 8 0 24.58 1.37 0.35 0.03 22c96b4
RTX 2060 AVX2 CUDA small 8 0 74.70 2.91 0.84 0.07 22c96b4
RTX 2060 AVX2 CUDA small-q5_0 8 0 76.12 2.84 0.77 0.08 22c96b4
RTX 2060 AVX2 CUDA small-q5_1 8 0 76.14 2.84 0.76 0.08 22c96b4
RTX 2060 AVX2 CUDA medium 8 0 200.69 6.46 1.83 0.17 22c96b4
RTX 2060 AVX2 CUDA medium-q5_0 8 0 204.80 5.90 1.65 0.19 22c96b4
RTX 2060 AVX2 CUDA medium-q5_1 8 0 205.61 5.85 1.61 0.19 22c96b4
RTX 2060 AVX2 CUDA medium-dis 8 0 186.17 0.86 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA large-v2 8 0 347.22 10.36 2.82 0.29 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_0 8 0 357.06 8.81 2.58 0.34 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_1 8 0 356.97 8.62 2.49 0.33 22c96b4
RTX 2060 AVX2 CUDA large-v2-dis 8 0 318.05 1.03 0.34 0.04 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 8 1 7.21 0.76 0.29 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_0 8 1 7.42 0.82 0.18 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_1 8 1 7.38 0.82 0.18 0.02 22c96b4
RTX 2060 AVX2 CUDA ...
Read more

v1.5.5

16 Apr 11:14
7395c70
Compare
Choose a tag to compare

Overview

Many small incremental updates + Token level timestamps with DTW by @denersc in #1485
Feedback is welcome!

Full Changelog: v1.5.4...v1.5.5

What's Changed

New Contributors

Read more

v1.5.4

05 Jan 15:20
0b9af32
Compare
Choose a tag to compare

Overview

  • Faster Core ML ANE models (#1716)
  • CUDA bugfix causing random erros in the transcription
  • Fix SwiftUI example build

Full Changelog: v1.5.3...v1.5.4

v1.5.3

03 Jan 17:39
9962371
Compare
Choose a tag to compare

Overview

Minor maintenance release:

  • Fix CUDA issues where the transcription produces garbage
  • FIX quantized models to work with CUDA backend
  • Allow to use whisper.cpp and llama.cpp together in SwiftUI projects

What's Changed

New Contributors

Full Changelog: v1.5.2...v1.5.3

v1.5.2

14 Dec 16:06
88112c8
Compare
Choose a tag to compare

Overview

Minor maintenance release:

  • Re-enable CPU BLAS processing after fixing a regression (#1583)

Add new example: wchess

wchess-0.mp4

Shoutout to @fraxy-v (implementation) and @ejones (grammar) for making it work!

What's Changed

New Contributors

Full Changelog: v1.5.1...v1.5.2

v1.5.1

24 Nov 10:45
9d6ebd8
Compare
Choose a tag to compare

Overview

Minor update:

  • With Metal, auto-fallback to CPU if device does not support Apple7 family
  • Add server example

What's Changed

New Contributors

Full Changelog: v1.5.0...v1.5.1