Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisperx Optimization #2258

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Whisperx Optimization #2258

wants to merge 8 commits into from

Conversation

hauck-jvsh
Copy link
Member

Closes #1539

@hauck-jvsh
Copy link
Member Author

This must be used with our fork of the whisperx, as I had to change the library to accept more than one file and to do not use ffmpeg, We already convert the to wav before sending it to be transcribed.

@lfcnassif
Copy link
Member

lfcnassif commented Jul 10, 2024

Thank you very very much @hauck-jvsh! I'll run a basic accuracy test soon, to be sure it wasn't affected by the changes. Have you pushed the changes to WhisperX to some branch in our fork?

To let others know, with the batch inference approach suggested on #1539 (processing up to 16 audios at the same time) and avoiding a duplicated ffmpeg run into the library, @hauck-jvsh was able to speed up WhisperX inference on a big batch of audios with different durations up to 5x-6x on RTX 3090!

@lfcnassif lfcnassif changed the title Whisperx Whisperx Optimization Jul 10, 2024
@hauck-jvsh
Copy link
Member Author

hauck-jvsh commented Jul 10, 2024

Yes, I have pushed the changes to a branch called multi-audio
https://github.com/sepinf-inc/whisperX/tree/multi-audio

@hauck-jvsh
Copy link
Member Author

I forgot to run the tests with all the changes disabled to compare. I ran tests only against the wav2vec model and I manage to the get whisperX only 1.6 times slower than the wav2vec.

@lfcnassif
Copy link
Member

lfcnassif commented Jul 10, 2024

I forgot to run the tests with all the changes disabled to compare. I ran tests only against the wav2vec model and I manage to the get whisperX only 1.6 times slower than the wav2vec.

Default WhisperX is about 9x times slower than Wav2Vec2 large model on RTX 3090 with my test data sets, so 9 / 1.6 = 5.6x speed up :-)

I've just computed the WER numbers for 2 relevant data sets:

WER TedX (3.8h) Real data set (1h)
WhisperX-LargeV3 0.134 0.193
ThisPR-LargeV3 0.134 0.187

So accuracy seems fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try to speed up Whisper transcription inference
2 participants