Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality suffers on earnings22 dataset #5

Closed
soupslurpr opened this issue Jun 29, 2024 · 3 comments
Closed

Quality suffers on earnings22 dataset #5

soupslurpr opened this issue Jun 29, 2024 · 3 comments

Comments

@soupslurpr
Copy link

soupslurpr commented Jun 29, 2024

whisper-tiny.en gets 18 WER without dynamic audio context on https://huggingface.co/datasets/distil-whisper/earnings22 (chunked, test) using evaluation.ipynb while acft-whisper-tiny.en with dynamic audio context gets 318 WER. This indicates that the acft fine tuned model with dynamic audio context may not work well in real-world conditions which include diverse accents and varying speech conditions.

@soupslurpr
Copy link
Author

Not sure why but changing ADD_AUDIO_CTX to 64 makes acft-whisper-tiny.en achieve 19 WER on earnings22.

@stopthinking102

This comment was marked as off-topic.

@abb128
Copy link
Collaborator

abb128 commented Jan 6, 2025

For production deployment you should use additional context of at least 32, I did some tests here with different values showing that even with librispeech the WER improves with slight added context. It's possible that earnings22 particularly demonstrates weakness when there is not enough extra silence

@abb128 abb128 closed this as completed Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants