Releases · SYSTRAN/faster-whisper

24 Jul 09:20

guillaumekln

v0.7.1

5c17de1

faster-whisper 0.7.1

Fix a bug related to no_speech_threshold: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech
Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability

Assets 2

18 Jul 13:30

guillaumekln

v0.7.0

171d90d

faster-whisper 0.7.0

Improve word-level timestamps heuristics

Some recent improvements from openai-whisper are ported to faster-whisper:

Squash long words at window and sentence boundaries (openai/whisper@255887f)
Improve timestamp heuristics (openai/whisper@f572f21)

Support download of user converted models from the Hugging Face Hub

The WhisperModel constructor now accepts any repository ID as argument, for example:

model = WhisperModel("username/whisper-large-v2-ct2")

The utility function download_model has been updated similarly.

Other changes

Accept an iterable of token IDs for the argument initial_prompt (useful to include timestamp tokens in the prompt)
Avoid computing higher temperatures when no_speech_threshold is met (same as openai/whisper@e334ff1)
Fix truncated output when using a prefix without disabling timestamps
Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes

Assets 2

24 May 14:22

guillaumekln

v0.6.0

2a00621

faster-whisper 0.6.0

Extend `TranscriptionInfo` with additional properties

all_language_probs: the probability of each language (only set when language=None)
vad_options: the VAD options that were used for this transcription

Improve robustness on temporary connection issues to the Hugging Face Hub

When the model is loaded from its name like WhisperModel("large-v2"), a request is made to the Hugging Face Hub to check if some files should be downloaded.

It can happen that this request raises an exception: the Hugging Face Hub is down, the internet is temporarily disconnected, etc. These types of exception are now catched and the library will try to directly load the model from the local cache if it exists.

Other changes

Enable the onnxruntime dependency for Python 3.11 as the latest version now provides binary wheels for Python 3.11
Fix occasional IndexError on empty segments when using word_timestamps=True
Export __version__ at the module level
Include missing requirement files in the released source distribution

Assets 2

26 Apr 15:41

guillaumekln

v0.5.1

a3dcb90

faster-whisper 0.5.1

Fix download_root to correctly set the cache directory where the models are downloaded.

Assets 2

25 Apr 15:04

guillaumekln

v0.5.0

67cce3f

faster-whisper 0.5.0

Improved logging

Some information are now logged under INFO and DEBUG levels. The logging level can be configured like this:

import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

More control over model downloads

New arguments were added to the WhisperModel constructor to better control how the models are downloaded:

download_root to specify where the model should be downloaded.
local_files_only to avoid downloading the model and directly return the path to the cached model, it it exists.

Other changes

Improve the default VAD behavior to prevent some words from being assigned to the incorrect speech chunk in the original audio
Fix incorrect application of option condition_on_previous_text=False (note that the bug still exists in openai/whisper v20230314)
Fix segment timestamps that are sometimes inconsistent with the words timestamps after VAD
Extend the Segment structure with additional properties to match openai/whisper
Rename AudioInfo to TranscriptionInfo and add a new property options to summarize the transcription options that were used

Assets 2

04 Apr 10:58

guillaumekln

v0.4.1

746f269

faster-whisper 0.4.1

Fix some IndexError exceptions:

when VAD is enabled and a predicted timestamp is after the last speech chunk
when word timestamps are enabled and the model predicts a tokens sequence that is decoded to invalid Unicode characters

Assets 2

03 Apr 15:29

guillaumekln

v0.4.0

8c36ac1

faster-whisper 0.4.0

Integration of Silero VAD

The Silero VAD model is integrated to ignore parts of the audio without speech:

model.transcribe(..., vad_filter=True)

The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.

Note: the Silero model is executed with onnxruntime which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.

Speaker diarization using stereo channels

The function decode_audio has a new argument split_stereo to split stereo audio into seperate left and right channels:

left, right = decode_audio(audio_file, split_stereo=True)

# model.transcribe(left)
# model.transcribe(right)

Other changes

Add Segment attributes avg_log_prob and no_speech_prob (same definition as openai/whisper)
Ignore audio frames raising an av.error.InvalidDataError exception during decoding
Fix option prefix to be passed only to the first 30-second window
Extend suppress_tokens with some special tokens that should always be suppressed (unless suppress_tokens is None)
Raise a more helpful error message when the selected model size is invalid
Disable the progress bar when the model to download is already in the cache

Assets 2

24 Mar 10:00

guillaumekln

v0.3.0

7808edd

faster-whisper 0.3.0

Converted models are now available on the Hugging Face Hub and are automatically downloaded when creating a WhisperModel instance. The conversion step is no longer required for the original Whisper models.

# Automatically download https://huggingface.co/guillaumekln/faster-whisper-large-v2
model = WhisperModel("large-v2")

Run the encoder only once for each 30-second window. Before this change the same window could be encoded multiple times, for example in the temperature fallback or when word-level timestamps is enabled.

Assets 2

22 Mar 20:12

guillaumekln

v0.2.0

c910ec0

faster-whisper 0.2.0

Initial publication of the library on PyPI: https://pypi.org/project/faster-whisper/

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve word-level timestamps heuristics

Support download of user converted models from the Hugging Face Hub

Other changes

Extend `TranscriptionInfo` with additional properties

Improve robustness on temporary connection issues to the Hugging Face Hub

Other changes

Improved logging

More control over model downloads

Other changes

Integration of Silero VAD

Speaker diarization using stereo channels

Other changes

Releases: SYSTRAN/faster-whisper

faster-whisper 0.7.1

faster-whisper 0.7.0

Improve word-level timestamps heuristics

Support download of user converted models from the Hugging Face Hub

Other changes

faster-whisper 0.6.0

Extend TranscriptionInfo with additional properties

Improve robustness on temporary connection issues to the Hugging Face Hub

Other changes

faster-whisper 0.5.1

faster-whisper 0.5.0

Improved logging

More control over model downloads

Other changes

faster-whisper 0.4.1

faster-whisper 0.4.0

Integration of Silero VAD

Speaker diarization using stereo channels

Other changes

faster-whisper 0.3.0

faster-whisper 0.2.0

Extend `TranscriptionInfo` with additional properties