You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
whisper-tiny.en gets 18 WER without dynamic audio context on https://huggingface.co/datasets/distil-whisper/earnings22 (chunked, test) using evaluation.ipynb while acft-whisper-tiny.en with dynamic audio context gets 318 WER. This indicates that the acft fine tuned model with dynamic audio context may not work well in real-world conditions which include diverse accents and varying speech conditions.
The text was updated successfully, but these errors were encountered:
For production deployment you should use additional context of at least 32, I did some tests here with different values showing that even with librispeech the WER improves with slight added context. It's possible that earnings22 particularly demonstrates weakness when there is not enough extra silence
whisper-tiny.en gets 18 WER without dynamic audio context on https://huggingface.co/datasets/distil-whisper/earnings22 (chunked, test) using evaluation.ipynb while acft-whisper-tiny.en with dynamic audio context gets 318 WER. This indicates that the acft fine tuned model with dynamic audio context may not work well in real-world conditions which include diverse accents and varying speech conditions.
The text was updated successfully, but these errors were encountered: