C/C++ implementation of the model #130
Replies: 7 comments 21 replies
-
Hi @ggerganov |
Beta Was this translation helpful? Give feedback.
-
C# wrapper would be nice :) |
Beta Was this translation helpful? Give feedback.
-
Wow: " someone was able to even run the large model on Samsung A52"!. |
Beta Was this translation helpful? Give feedback.
-
It took 13 minutes to do a 10 minute video on medium.en with six threads. It seems to take longer (at least to get started) if you have too many threads. Your CPU shouldn't be maxed out. Currently the whisper CPU mode doesn't even start transcribing for me, so I don't know how long it would take on that video. The video takes 3 minutes on my RTX 2060. Running Linux. After trying again for another 17 minutes with the whisper CPU mode it had only printed the first line. No idea what's up with that. So whisper.cpp definitely is faster 😁 |
Beta Was this translation helpful? Give feedback.
-
This is cool. I bumped into this because of trying your all in one script, which is fantastic! I have been wondering why isn't there JavaScript implementation for Whisper. I guess it should be possible to run Whisper client-side only since Whisper is running locally, but as you have been tinkering with this tech, what's your opinion? Will we be able to run Whisper in browsers client-side? @ggerganov |
Beta Was this translation helpful? Give feedback.
-
I noticed that token_timestamps is "EXPERIMENTAL." Are there tests to verify if working correctly? |
Beta Was this translation helpful? Give feedback.
-
it does not work on cuda. not installing another 3GB of cuda drivers. why can't we use existing drivers? and what drivers are needed from cuda? what folder should be loaded in the PATH? |
Beta Was this translation helpful? Give feedback.
-
Great work on this! The whisper model is very interesting and I think is going to open a lot of interesting possibilities in the future (not just for speech-to-text).
This weekend, as a learning exercise, I decided to implement the inference from scratch in C/C++:
https://github.com/ggerganov/whisper.cpp
I'm happy with the results. I'm still struggling with the beam search sampling strategy that is used in the original model. Still, the greedy implementation that I hacked seems to do the job most of the time. If anyone gives this a try, would be happy to hear some feedback.
Update 05 Oct:
large
model on Samsung A52!rt_esl_csgo_1.mp4
Update 30 Sep:
Reduced memory usage even more:
Here is a real-world example of running the implementation to transcribe a 1h 30min video of John Carmack, on a MacBook M1 Pro (CPU only):
Update 29 Sep:
Added transcription timestamps.
Update 28 Sep:
Here is the current memory usage for the different models, using the latest implementation (Flash Attention + Flash Forward + 16-bit key/value memory):
Beta Was this translation helpful? Give feedback.
All reactions