Skip to content

Releases: SYSTRAN/faster-whisper

faster-whisper 0.7.1

24 Jul 09:20
Compare
Choose a tag to compare
  • Fix a bug related to no_speech_threshold: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech
  • Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability

faster-whisper 0.7.0

18 Jul 13:30
Compare
Choose a tag to compare

Improve word-level timestamps heuristics

Some recent improvements from openai-whisper are ported to faster-whisper:

Support download of user converted models from the Hugging Face Hub

The WhisperModel constructor now accepts any repository ID as argument, for example:

model = WhisperModel("username/whisper-large-v2-ct2")

The utility function download_model has been updated similarly.

Other changes

  • Accept an iterable of token IDs for the argument initial_prompt (useful to include timestamp tokens in the prompt)
  • Avoid computing higher temperatures when no_speech_threshold is met (same as openai/whisper@e334ff1)
  • Fix truncated output when using a prefix without disabling timestamps
  • Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes

faster-whisper 0.6.0

24 May 14:22
Compare
Choose a tag to compare

Extend TranscriptionInfo with additional properties

  • all_language_probs: the probability of each language (only set when language=None)
  • vad_options: the VAD options that were used for this transcription

Improve robustness on temporary connection issues to the Hugging Face Hub

When the model is loaded from its name like WhisperModel("large-v2"), a request is made to the Hugging Face Hub to check if some files should be downloaded.

It can happen that this request raises an exception: the Hugging Face Hub is down, the internet is temporarily disconnected, etc. These types of exception are now catched and the library will try to directly load the model from the local cache if it exists.

Other changes

  • Enable the onnxruntime dependency for Python 3.11 as the latest version now provides binary wheels for Python 3.11
  • Fix occasional IndexError on empty segments when using word_timestamps=True
  • Export __version__ at the module level
  • Include missing requirement files in the released source distribution

faster-whisper 0.5.1

26 Apr 15:41
Compare
Choose a tag to compare

Fix download_root to correctly set the cache directory where the models are downloaded.

faster-whisper 0.5.0

25 Apr 15:04
Compare
Choose a tag to compare

Improved logging

Some information are now logged under INFO and DEBUG levels. The logging level can be configured like this:

import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

More control over model downloads

New arguments were added to the WhisperModel constructor to better control how the models are downloaded:

  • download_root to specify where the model should be downloaded.
  • local_files_only to avoid downloading the model and directly return the path to the cached model, it it exists.

Other changes

  • Improve the default VAD behavior to prevent some words from being assigned to the incorrect speech chunk in the original audio
  • Fix incorrect application of option condition_on_previous_text=False (note that the bug still exists in openai/whisper v20230314)
  • Fix segment timestamps that are sometimes inconsistent with the words timestamps after VAD
  • Extend the Segment structure with additional properties to match openai/whisper
  • Rename AudioInfo to TranscriptionInfo and add a new property options to summarize the transcription options that were used

faster-whisper 0.4.1

04 Apr 10:58
Compare
Choose a tag to compare

Fix some IndexError exceptions:

  • when VAD is enabled and a predicted timestamp is after the last speech chunk
  • when word timestamps are enabled and the model predicts a tokens sequence that is decoded to invalid Unicode characters

faster-whisper 0.4.0

03 Apr 15:29
Compare
Choose a tag to compare

Integration of Silero VAD

The Silero VAD model is integrated to ignore parts of the audio without speech:

model.transcribe(..., vad_filter=True)

The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.

Note: the Silero model is executed with onnxruntime which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.

Speaker diarization using stereo channels

The function decode_audio has a new argument split_stereo to split stereo audio into seperate left and right channels:

left, right = decode_audio(audio_file, split_stereo=True)

# model.transcribe(left)
# model.transcribe(right)

Other changes

  • Add Segment attributes avg_log_prob and no_speech_prob (same definition as openai/whisper)
  • Ignore audio frames raising an av.error.InvalidDataError exception during decoding
  • Fix option prefix to be passed only to the first 30-second window
  • Extend suppress_tokens with some special tokens that should always be suppressed (unless suppress_tokens is None)
  • Raise a more helpful error message when the selected model size is invalid
  • Disable the progress bar when the model to download is already in the cache

faster-whisper 0.3.0

24 Mar 10:00
Compare
Choose a tag to compare
  • Converted models are now available on the Hugging Face Hub and are automatically downloaded when creating a WhisperModel instance. The conversion step is no longer required for the original Whisper models.
# Automatically download https://huggingface.co/guillaumekln/faster-whisper-large-v2
model = WhisperModel("large-v2")
  • Run the encoder only once for each 30-second window. Before this change the same window could be encoded multiple times, for example in the temperature fallback or when word-level timestamps is enabled.

faster-whisper 0.2.0

22 Mar 20:12
Compare
Choose a tag to compare

Initial publication of the library on PyPI: https://pypi.org/project/faster-whisper/