v1.7.3 #2645

ggerganov · 2024-12-18T16:15:20Z

ggerganov
Dec 18, 2024
Maintainer

Overview

Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
Reduce hallucinations during silence by @jkarthic in Fix hallucinations during silence #2629
Implement no_speech_thold by @jkarthic in Implement no_speech_thold #2625

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	Metal	tiny	1	1	7.90	1.26	0.35	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q5_0	1	1	8.44	1.23	0.36	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q5_1	1	1	8.26	1.27	0.37	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q8_0	1	1	8.03	1.21	0.35	0.01	`ed733e8`
M2 Ultra	Metal	base	1	1	13.77	1.80	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q5_0	1	1	15.02	1.72	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q5_1	1	1	14.93	1.74	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q8_0	1	1	14.26	1.68	0.41	0.02	`ed733e8`
M2 Ultra	Metal	small	1	1	39.76	3.54	0.85	0.05	`ed733e8`
M2 Ultra	Metal	small-q5_0	1	1	45.07	3.47	0.87	0.05	`ed733e8`
M2 Ultra	Metal	small-q5_1	1	1	44.82	3.49	0.87	0.05	`ed733e8`
M2 Ultra	Metal	small-q8_0	1	1	41.79	3.30	0.84	0.05	`ed733e8`
M2 Ultra	Metal	medium	1	1	106.73	7.28	1.78	0.11	`ed733e8`
M2 Ultra	Metal	medium-q5_0	1	1	124.43	6.63	1.83	0.12	`ed733e8`
M2 Ultra	Metal	medium-q5_1	1	1	124.19	6.70	1.84	0.12	`ed733e8`
M2 Ultra	Metal	medium-q8_0	1	1	113.88	6.52	1.75	0.11	`ed733e8`
M2 Ultra	Metal	medium-dis	1	1	94.97	0.97	0.22	0.01	`ed733e8`
M2 Ultra	Metal	large-v2	1	1	193.33	10.53	2.65	0.20	`ed733e8`
M2 Ultra	Metal	large-v2-q5_0	1	1	229.22	9.52	2.72	0.23	`ed733e8`
M2 Ultra	Metal	large-v2-q5_1	1	1	229.40	9.62	2.73	0.23	`ed733e8`
M2 Ultra	Metal	large-v2-q8_0	1	1	207.30	9.36	2.59	0.21	`ed733e8`
M2 Ultra	Metal	large-v2-dis	1	1	171.43	1.09	0.25	0.02	`ed733e8`
M2 Ultra	Metal	large-v3-turbo	1	1	173.45	1.73	0.41	0.03	`ed733e8`
M2 Ultra	Metal	large-v3-turbo-q5_0	1	1	205.52	1.52	0.42	0.04	`ed733e8`
M2 Ultra	Metal	large-v3-turbo-q8_0	1	1	185.90	1.48	0.40	0.03	`ed733e8`

What's Changed

sync : ggml by @ggerganov in sync : ggml #2573
ruby : Follow source tree change by @KitaitiMakoto in ruby : Follow source tree change #2580
Add q8_0 models to download-ggml-model.sh by @mrienstra in Add q8_0 models to download-ggml-model.sh #2589
ruby : Add low-level methods to transcribe by @KitaitiMakoto in ruby : Add low-level methods to transcribe #2585
sync : ggml by @ggerganov in sync : ggml #2608
ruby : Sync whisper.cpp and model download feature by @KitaitiMakoto in ruby : Sync whisper.cpp and model download feature #2617
Fix typo in download-ggml-model.sh by @mrienstra in Fix typo in download-ggml-model.sh #2623
Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists by @Thamster in Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists #2624
fix: prevent division by zero in soft_max vulkan shader by @gn64 in fix: prevent division by zero in soft_max vulkan shader #2633
cmake : fix "amd64" processor string by @ggerganov in cmake : fix "amd64" processor string #2638
Fix typo in Java Binding README by @crummyh in Fix typo in Java Binding README #2637
Fix hallucinations during silence by @jkarthic in Fix hallucinations during silence #2629
Implement no_speech_thold by @jkarthic in Implement no_speech_thold #2625
Improve consistency in stream exameple README commands by @crummyh in Improve consistency in stream exameple README commands #2642
ruby : Add no_speech_thold by @KitaitiMakoto in ruby : Add no_speech_thold #2641
sync : ggml by @ggerganov in sync : ggml #2639
ci : msys enable SDL2 build by @ggerganov in ci : msys enable SDL2 build #2635

New Contributors

@Thamster made their first contribution in Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists #2624
@gn64 made their first contribution in fix: prevent division by zero in soft_max vulkan shader #2633
@crummyh made their first contribution in Fix typo in Java Binding README #2637
@jkarthic made their first contribution in Fix hallucinations during silence #2629

Full Changelog: v1.7.2...v1.7.3

This discussion was created from the release v1.7.3.

mrfragger · 2024-12-18T21:15:16Z

mrfragger
Dec 18, 2024

This is on Mac M1 8GB RAM. large-v3-turbo gives 5x realtime speed but it hallucinates more often than large-v2 does. So it does offer a speed bump for quantized models as it states. 3x compared to 2.3x.

whisper 1.7.2
whisper.cpp took 00h:13m:51s
Transcription took 831 seconds

Total duration of audiobook is 00h:32m:24s
Total duration of audiobook is 1944 seconds

Whisper large-v2-q8_0 model transcribed at 2.34x realtime speed

whisper 1.7.3

whisper.cpp took 00h:10m:53s
Transcription took 653 seconds

Total duration of audiobook is 00h:32m:24s
Total duration of audiobook is 1944 seconds

Whisper large-v2-q8_0 model transcribed at 2.98x realtime speed

1.7.3
whisper.cpp took 00h:06m:22s
Transcription took 382 seconds

Total duration of audiobook is 00h:32m:24s
Total duration of audiobook is 1944 seconds

Whisper large-v3-turbo model transcribed at 5.09x realtime speed

if there was a large-v2-turbo that would be ideal. I don't notice any speed improvement from 1.7.2 to 1.7.3 for turbo model but it didn't claim there was one. Both are around 5x realtime speed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.7.3 #2645

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

v1.7.3 #2645

ggerganov Dec 18, 2024 Maintainer

Overview

What's Changed

New Contributors

Replies: 1 comment

mrfragger Dec 18, 2024

ggerganov
Dec 18, 2024
Maintainer

mrfragger
Dec 18, 2024