Hello world `speechbox`!

This is the first release of speechbox, providing the Punctuation Restoration task using whisper.

You need to install speechbox as well as transformers & accelerate in order to use the PunctuationRestorer class:

pip install --upgrade speechbox transformers accelerate

For an initial example, we recommend to also install datasets:

pip install datasets

Then you can run the following code snippet:

from speechbox import PunctuationRestorer
from datasets import load_dataset

streamed_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)

# get first sample
sample = next(iter(streamed_dataset))

# print out normalized transcript
print(sample["text"])
# => "HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE"

# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")

restored_text, log_probs = restorer(sample["audio"]["array"], sample["text"], sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)

print("Restored text:\n", restored_text)

Note: This project is very young and intended to be run largely by the community. Please check out the Contribution Guide if you'd like to contribute ❤️

You can try out the model here: https://huggingface.co/spaces/speechbox/whisper-restore-punctuation as well.

Speechly,
🤗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Release

Hello world `speechbox`!

Initial Release

Hello world speechbox!

Hello world `speechbox`!