Releases: SYSTRAN/faster-whisper
faster-whisper 0.7.1
- Fix a bug related to
no_speech_threshold
: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech - Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability
faster-whisper 0.7.0
Improve word-level timestamps heuristics
Some recent improvements from openai-whisper are ported to faster-whisper:
- Squash long words at window and sentence boundaries (openai/whisper@255887f)
- Improve timestamp heuristics (openai/whisper@f572f21)
Support download of user converted models from the Hugging Face Hub
The WhisperModel
constructor now accepts any repository ID as argument, for example:
model = WhisperModel("username/whisper-large-v2-ct2")
The utility function download_model
has been updated similarly.
Other changes
- Accept an iterable of token IDs for the argument
initial_prompt
(useful to include timestamp tokens in the prompt) - Avoid computing higher temperatures when
no_speech_threshold
is met (same as openai/whisper@e334ff1) - Fix truncated output when using a prefix without disabling timestamps
- Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes
faster-whisper 0.6.0
Extend TranscriptionInfo
with additional properties
all_language_probs
: the probability of each language (only set whenlanguage=None
)vad_options
: the VAD options that were used for this transcription
Improve robustness on temporary connection issues to the Hugging Face Hub
When the model is loaded from its name like WhisperModel("large-v2")
, a request is made to the Hugging Face Hub to check if some files should be downloaded.
It can happen that this request raises an exception: the Hugging Face Hub is down, the internet is temporarily disconnected, etc. These types of exception are now catched and the library will try to directly load the model from the local cache if it exists.
Other changes
- Enable the
onnxruntime
dependency for Python 3.11 as the latest version now provides binary wheels for Python 3.11 - Fix occasional
IndexError
on empty segments when usingword_timestamps=True
- Export
__version__
at the module level - Include missing requirement files in the released source distribution
faster-whisper 0.5.1
Fix download_root
to correctly set the cache directory where the models are downloaded.
faster-whisper 0.5.0
Improved logging
Some information are now logged under INFO
and DEBUG
levels. The logging level can be configured like this:
import logging
logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)
More control over model downloads
New arguments were added to the WhisperModel
constructor to better control how the models are downloaded:
download_root
to specify where the model should be downloaded.local_files_only
to avoid downloading the model and directly return the path to the cached model, it it exists.
Other changes
- Improve the default VAD behavior to prevent some words from being assigned to the incorrect speech chunk in the original audio
- Fix incorrect application of option
condition_on_previous_text=False
(note that the bug still exists in openai/whisper v20230314) - Fix segment timestamps that are sometimes inconsistent with the words timestamps after VAD
- Extend the
Segment
structure with additional properties to match openai/whisper - Rename
AudioInfo
toTranscriptionInfo
and add a new propertyoptions
to summarize the transcription options that were used
faster-whisper 0.4.1
Fix some IndexError
exceptions:
- when VAD is enabled and a predicted timestamp is after the last speech chunk
- when word timestamps are enabled and the model predicts a tokens sequence that is decoded to invalid Unicode characters
faster-whisper 0.4.0
Integration of Silero VAD
The Silero VAD model is integrated to ignore parts of the audio without speech:
model.transcribe(..., vad_filter=True)
The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.
Note: the Silero model is executed with onnxruntime
which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.
Speaker diarization using stereo channels
The function decode_audio
has a new argument split_stereo
to split stereo audio into seperate left and right channels:
left, right = decode_audio(audio_file, split_stereo=True)
# model.transcribe(left)
# model.transcribe(right)
Other changes
- Add
Segment
attributesavg_log_prob
andno_speech_prob
(same definition as openai/whisper) - Ignore audio frames raising an
av.error.InvalidDataError
exception during decoding - Fix option
prefix
to be passed only to the first 30-second window - Extend
suppress_tokens
with some special tokens that should always be suppressed (unlesssuppress_tokens is None
) - Raise a more helpful error message when the selected model size is invalid
- Disable the progress bar when the model to download is already in the cache
faster-whisper 0.3.0
- Converted models are now available on the Hugging Face Hub and are automatically downloaded when creating a
WhisperModel
instance. The conversion step is no longer required for the original Whisper models.
# Automatically download https://huggingface.co/guillaumekln/faster-whisper-large-v2
model = WhisperModel("large-v2")
- Run the encoder only once for each 30-second window. Before this change the same window could be encoded multiple times, for example in the temperature fallback or when word-level timestamps is enabled.
faster-whisper 0.2.0
Initial publication of the library on PyPI: https://pypi.org/project/faster-whisper/