Improve language detection #732

trungkienbkhn · 2024-03-04T03:39:54Z

faster_whisper/transcribe.py

Regression since SYSTRAN#732 PR

* Bugfix: code breaks if audio is empty Regression since #732 PR

* Supported new options for batched transcriptions: * `language_detection_threshold` * `language_detection_segments` * Updated `WhisperModel.detect_language` function to include the improved language detection from #732 and added docstrings, it's now used inside `transcribe` function. * Removed the following functions as they are no longer needed: * `WhisperModel.detect_language_multi_segment` and its test * `BatchedInferencePipeline.get_language_and_tokenizer` * Added tests for empty audios

trungkienbkhn force-pushed the improve-language-detection branch from 301eeaf to 01f45df Compare March 4, 2024 03:42

trungkienbkhn mentioned this pull request Mar 4, 2024

Improve Language detection #265

Open

emanueleielo approved these changes Mar 8, 2024

View reviewed changes

ddorian reviewed Mar 11, 2024

View reviewed changes

faster_whisper/transcribe.py Outdated Show resolved Hide resolved

Improve language detection

305c63f

trungkienbkhn force-pushed the improve-language-detection branch from 01f45df to 305c63f Compare March 12, 2024 06:49

nguyendc-systran merged commit 1eb9a80 into SYSTRAN:master Mar 12, 2024
3 checks passed

Purfview added a commit to Purfview/faster-whisper that referenced this pull request Mar 31, 2024

Bugfix: code breaks if audio is empty

224fa4f

Regression since SYSTRAN#732 PR

Purfview mentioned this pull request Mar 31, 2024

Bugfix: code breaks if audio is empty #768

Merged

nguyendc-systran pushed a commit that referenced this pull request Apr 2, 2024

Bugfix: code breaks if audio is empty (#768)

8ae82c8

* Bugfix: code breaks if audio is empty Regression since #732 PR

dodysw mentioned this pull request Jul 5, 2024

ValueError exception when no language is detected #900

Closed

MahmoudAshraf97 mentioned this pull request Nov 15, 2024

Deduplication of language detection functions #1146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve language detection #732

Improve language detection #732

trungkienbkhn commented Mar 4, 2024

Improve language detection #732

Improve language detection #732

Conversation

trungkienbkhn commented Mar 4, 2024