llamafile - transcription #2317

rafael844 · 2024-09-02T21:18:13Z

rafael844
Sep 2, 2024

There is a mozzila llamafile implementation to use whisper with CPU that says its very fast. I didnt try it, but you could take a look if its interesting.
https://github.com/Mozilla-Ocho/llamafile/blob/0.8.13/whisper.cpp/doc/index.md

https://www.youtube.com/watch?v=-mRi-B3t6fA

gfd2020 · 2024-09-09T03:42:23Z

gfd2020
Sep 9, 2024
Collaborator

hi @rafael844 , this is very interesting.
I performed some tests.

1 audio file of 58 seconds, medium model, 15 threads (i7-12700H). Both on CPU , precision = int8.

WhisperProcess.py (whisperX) took 38 seconds to transcribe. ( whisper-medium model 1.5 GB)

Whisper-llamafile.exe took 24 seconds to transcribe. (ggml-medium model 1.5 GB)

======================================================
WhisperProcess.py (whisperX) took 15 seconds to transcribe. ( whisper-base model 145 MB)

Whisper-llamafile.exe took 3 seconds to transcribe. (ggml-base model 145 MB)

1 reply

lfcnassif Sep 9, 2024
Maintainer

Thanks @gfd2020 for your tests. Seems this implementation is based on whisper.cpp, which is optimized for CPU, but, unfortunately, had a lower accuracy than others in my past tests #1335.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamafile - transcription #2317

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

llamafile - transcription #2317

rafael844 Sep 2, 2024

Replies: 1 comment · 1 reply

gfd2020 Sep 9, 2024 Collaborator

lfcnassif Sep 9, 2024 Maintainer

rafael844
Sep 2, 2024

Replies: 1 comment 1 reply

gfd2020
Sep 9, 2024
Collaborator

lfcnassif Sep 9, 2024
Maintainer