C/C++ implementation of the model #130

ggerganov · 2022-09-25T20:15:59Z

ggerganov
Sep 25, 2022

Great work on this! The whisper model is very interesting and I think is going to open a lot of interesting possibilities in the future (not just for speech-to-text).

This weekend, as a learning exercise, I decided to implement the inference from scratch in C/C++:

https://github.com/ggerganov/whisper.cpp

I'm happy with the results. I'm still struggling with the beam search sampling strategy that is used in the original model. Still, the greedy implementation that I hacked seems to do the job most of the time. If anyone gives this a try, would be happy to hear some feedback.

Update 05 Oct:

Added C-style API for easy use in other projects/languages
Initial support for Android - someone was able to even run the large model on Samsung A52!
Initial support for Raspberry Pi
Added basic streaming example for real-time audio transcription from the mic using continuous inference with fixed audio step:

rt_esl_csgo_1.mp4

Update 30 Sep:

Reduced memory usage even more:

Model	Disk	Mem
tiny	75 MB	~240 MB
base	142 MB	~380 MB
small	466 MB	~970 MB
medium	1.5 GB	~2.5 GB
large	2.9 GB	~4.6 GB

Here is a real-world example of running the implementation to transcribe a 1h 30min video of John Carmack, on a MacBook M1 Pro (CPU only):

Model	Time to transcribe	Result
tiny.en	116 sec	carmack.wav-tiny.en.txt
small.en	563 sec	carmack.wav-small.en.txt

Update 29 Sep:

Added transcription timestamps.

Update 28 Sep:

Here is the current memory usage for the different models, using the latest implementation (Flash Attention + Flash Forward + 16-bit key/value memory):

Model	Disk	Mem
tiny	75 MB	~460 MB
base	142 MB	~620 MB
small	466 MB	~1.3 GB
medium	1.5 GB	~2.8 GB
large	2.9 GB	~4.9 GB

WilliamTambellini · 2022-09-25T20:33:27Z

WilliamTambellini
Sep 25, 2022

Hi @ggerganov
Congrats.
Have you tried/thought about exporting to onnx ?

1 reply

ggerganov Sep 28, 2022
Author

Hi, I'm not familiar with onnx - need to read about it and see if it makes sense to port my implementation to it.

FurkanGozukara · 2022-09-26T11:02:23Z

FurkanGozukara
Sep 26, 2022

C# wrapper would be nice :)

3 replies

ggerganov Sep 28, 2022
Author

I will probably provide a C wrapper at some point. Just need to figure out a proper API.

crazy4pi314 Sep 30, 2022

C wrapper and then I can call from Rust <3

ggerganov Oct 4, 2022
Author

C-style API is now available

dgoryeo · 2022-10-07T16:52:20Z

dgoryeo
Oct 7, 2022

Wow: " someone was able to even run the large model on Samsung A52"!.

1 reply

kevin01881 Oct 17, 2022

Wait what, how??! 😮 Would love to run this on my S21 Ultra.

Arlen22 · 2022-10-17T02:22:25Z

Arlen22
Oct 17, 2022

It took 13 minutes to do a 10 minute video on medium.en with six threads. It seems to take longer (at least to get started) if you have too many threads. Your CPU shouldn't be maxed out. Currently the whisper CPU mode doesn't even start transcribing for me, so I don't know how long it would take on that video. The video takes 3 minutes on my RTX 2060. Running Linux.

After trying again for another 17 minutes with the whisper CPU mode it had only printed the first line. No idea what's up with that. So whisper.cpp definitely is faster 😁

2 replies

ggerganov Oct 17, 2022
Author

Thanks for the info.
Just a note that the whisper.cpp implementation currently only supports the greedy sampling strategy, so to make a fair comparison with PyTorch, you would need to disable the beam search when running it.

Regarding the threads:
So far I have observed that using more than 8 threads leads to worse performance. Even if the CPU has 24 or 32 cores. So I suggest to use min(8, CPU_CORES) number of threads.

Arlen22 Dec 12, 2022

I don't know what you mean by greedy sampliing, beam search, or what the difference is. Are you saying that beam search makes whisper.cpp faster or slower?

erkkimon · 2022-10-23T21:05:16Z

erkkimon
Oct 23, 2022

This is cool. I bumped into this because of trying your all in one script, which is fantastic! I have been wondering why isn't there JavaScript implementation for Whisper. I guess it should be possible to run Whisper client-side only since Whisper is running locally, but as you have been tinkering with this tech, what's your opinion? Will we be able to run Whisper in browsers client-side? @ggerganov

7 replies

erkkimon Oct 25, 2022

I tried it. Works fantastically!

starsoccer Nov 7, 2022

Any plans to package this up into a nodejs module so it can be used outside of the browser?

ggerganov Nov 8, 2022
Author

Yes - npm package will be added soon

Stvad Dec 11, 2022

@ggerganov I was wondering if you had a chance to publish it on npm :)

ggerganov Dec 11, 2022
Author

@Stvad You can keep track on the progress here: ggerganov/whisper.cpp#260

pneyrinck · 2022-12-11T18:26:09Z

pneyrinck
Dec 11, 2022

I noticed that token_timestamps is "EXPERIMENTAL." Are there tests to verify if working correctly?

0 replies

audioscavenger · 2023-10-10T00:26:05Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C/C++ implementation of the model #130

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 21 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

C/C++ implementation of the model #130

Replies: 7 comments · 21 replies

ggerganov Sep 28, 2022 Author

ggerganov Sep 28, 2022 Author

ggerganov Oct 4, 2022 Author

ggerganov Oct 17, 2022 Author

ggerganov Nov 8, 2022 Author

ggerganov Dec 11, 2022 Author

Replies: 7 comments 21 replies

ggerganov Sep 28, 2022
Author

ggerganov Sep 28, 2022
Author

ggerganov Oct 4, 2022
Author

ggerganov Oct 17, 2022
Author

ggerganov Nov 8, 2022
Author

ggerganov Dec 11, 2022
Author