Skip to content

v0.2.0

Compare
Choose a tag to compare
@sanchit-gandhi sanchit-gandhi released this 27 Jan 17:37
c757285

Second Release

The second release of Speechbox adds a pipeline for ASR + Speaker Diarization. This allows you to transcribe long audio files and annotate the transcriptions with who spoke when.

To use this feature, you need to install speechbox as well as transformers & pyannote.audio:

pip install --upgrade speechbox transformers pyannote.audio

For an initial example, we recommend to also install datasets:

pip install datasets

Then you can run the following code snippet:

import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))

out = pipeline(sample["audio"])
print(out)