Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to speed up Whisper transcription inference #1539

Open
lfcnassif opened this issue Feb 23, 2023 · 2 comments · May be fixed by #2258
Open

Try to speed up Whisper transcription inference #1539

lfcnassif opened this issue Feb 23, 2023 · 2 comments · May be fixed by #2258
Assignees

Comments

@lfcnassif
Copy link
Member

General recommendations:
https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html

This one was found by @hauck-jvsh:
https://developer.nvidia.com/blog/accelerating-inference-up-to-6x-faster-in-pytorch-with-torch-tensorrt/

@lfcnassif
Copy link
Member Author

Just got a simple idea that can bring some speed up. Currently we are starting 2 transcription processes per GPU. I thought to use 3, but GPU used memory is already high, I see 20GB usage from 24GB. But maybe we can use some python threads to run 3 simultaneous transcriptions in the same python process, reusing the same model loaded on memory, instead of loading it for each process. GPU usage is already high, but maybe there is space for some speed up.

Running multiple transcriptions in inference batches is a common technique. But it would make the logic much more complex: we would have to group audios of similar duration, wait for them, maybe group audios from same client or from different ones, for how long time would we wait for more audios to put into the same group...

@lfcnassif lfcnassif changed the title Try to optimize Wav2Vec2 transcription inference Try to speed up Whisper transcription inference Mar 12, 2024
@lfcnassif
Copy link
Member Author

lfcnassif commented Apr 26, 2024

WhisperX uses batch inference (trasncription of many audio parts at the same time) to speed up transcription up to 10x on GPUs using just this technique. I think it is possible to change the WhisperX library to make it transcribe different audios at the same time using audio batches.

@hauck-jvsh, since you know and already contributed fixes and improvements to the transcription code, would you like to help improving WhisperX library (I just forked it in sepinf-inc repo) and improving IPED code to group audios of similar sizes before transcribing them? I think that would allow us to update our transcription service algorithm without the new hardware, which buying should take longer after the last government budget restrictions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants