Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve language detection #732

Merged

Conversation

trungkienbkhn
Copy link
Collaborator

Same with openai/whisper#676

@trungkienbkhn trungkienbkhn force-pushed the improve-language-detection branch from 01f45df to 305c63f Compare March 12, 2024 06:49
@nguyendc-systran nguyendc-systran merged commit 1eb9a80 into SYSTRAN:master Mar 12, 2024
3 checks passed
Purfview added a commit to Purfview/faster-whisper that referenced this pull request Mar 31, 2024
nguyendc-systran pushed a commit that referenced this pull request Apr 2, 2024
* Bugfix: code breaks if audio is empty

Regression since #732 PR
MahmoudAshraf97 added a commit that referenced this pull request Nov 16, 2024
* Supported new options for batched transcriptions:
  * `language_detection_threshold`
  * `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732  and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
  * `WhisperModel.detect_language_multi_segment` and its test
  * `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants