Releases: huggingface/speechbox
Patch Release v0.2.1
Fixes import checks for ASRDiarizationPipeline
class (see #16), displaying a nice message if either pyannote.audio
or torchaudio
are not installed.
v0.2.0
Second Release
The second release of Speechbox adds a pipeline for ASR + Speaker Diarization. This allows you to transcribe long audio files and annotate the transcriptions with who spoke when.
To use this feature, you need to install speechbox
as well as transformers
& pyannote.audio
:
pip install --upgrade speechbox transformers pyannote.audio
For an initial example, we recommend to also install datasets
:
pip install datasets
Then you can run the following code snippet:
import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)
# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))
out = pipeline(sample["audio"])
print(out)
Patch Release v0.1.2
Fixes a bug with beam search. See: 4d15bc9
Beam search (num_beams > 1
) has now been checked against greedy search and it seems to work as expected.
Patch Release v0.1.1
Make sure a nice error message is given if accelerate
is not installed. See: 8671ba2
Initial Release
Hello world speechbox
!
This is the first release of speechbox
, providing the Punctuation Restoration task using whisper.
You need to install speechbox
as well as transformers
& accelerate
in order to use the PunctuationRestorer
class:
pip install --upgrade speechbox transformers accelerate
For an initial example, we recommend to also install datasets
:
pip install datasets
Then you can run the following code snippet:
from speechbox import PunctuationRestorer
from datasets import load_dataset
streamed_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)
# get first sample
sample = next(iter(streamed_dataset))
# print out normalized transcript
print(sample["text"])
# => "HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE"
# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")
restored_text, log_probs = restorer(sample["audio"]["array"], sample["text"], sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)
print("Restored text:\n", restored_text)
Note: This project is very young and intended to be run largely by the community. Please check out the Contribution Guide if you'd like to contribute ❤️
You can try out the model here: https://huggingface.co/spaces/speechbox/whisper-restore-punctuation as well.
Speechly,
🤗