Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An option to substitute OpenAI's Whisper models for Kaldi? #313

Open
natelawrence opened this issue Oct 1, 2022 · 3 comments
Open

An option to substitute OpenAI's Whisper models for Kaldi? #313

natelawrence opened this issue Oct 1, 2022 · 3 comments

Comments

@natelawrence
Copy link

I'm not a developer but I do find Gentle very useful.

Since OpenAI released their Whisper models last week, I've been wondering if anyone with development skills would be interested in enabling an option to utilize Whisper instead of Kaldi when running Gentle.

I know that language support for spoken languages beyond English has been a long-standing request for Gentle.
Whisper appears to be pointedly multi-lingual, so perhaps this would make support for languages beyond English more easily achievable for Gentle?

Anyway, please let me know what scale of an undertaking this would be.
Thanks in advance.

@WillReynolds5
Copy link

not sure this is possible at the phoneme level because the whisper model is end-to-end trained to predict BPE tokens directly, which are often a full word or subword consisting of a few graphemes.

@m-bain
Copy link

m-bain commented Dec 27, 2022

https://github.com/m-bain/whisperX

@zxul767
Copy link

zxul767 commented Jul 28, 2023

Another option for word-level timestamps is faster-whisper.

I've been using it lately and it produces relatively good word-level timestamps. It does tend to have some recurrent errors, though, like missing the last syllable in the last word of each segment.

And, of course, it inherits several of the issues of vanilla whisper (e.g., "hallucinations", very bad alignment in sections with laughter, songs with vocals, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants