v0.2.0
Second Release
The second release of Speechbox adds a pipeline for ASR + Speaker Diarization. This allows you to transcribe long audio files and annotate the transcriptions with who spoke when.
To use this feature, you need to install speechbox
as well as transformers
& pyannote.audio
:
pip install --upgrade speechbox transformers pyannote.audio
For an initial example, we recommend to also install datasets
:
pip install datasets
Then you can run the following code snippet:
import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)
# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))
out = pipeline(sample["audio"])
print(out)