Release v0.2.0 · huggingface/speechbox

Second Release

The second release of Speechbox adds a pipeline for ASR + Speaker Diarization. This allows you to transcribe long audio files and annotate the transcriptions with who spoke when.

To use this feature, you need to install speechbox as well as transformers & pyannote.audio:

pip install --upgrade speechbox transformers pyannote.audio

For an initial example, we recommend to also install datasets:

pip install datasets

Then you can run the following code snippet:

import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))

out = pipeline(sample["audio"])
print(out)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Second Release