VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
VoiceRestore	🔊	yellow	green	gradio	5.0.0b3	app.py	false

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

It is based on this repo & demo of audio restorations: VoiceRestore

Build - using Gradio 🟠

!git lfs install
!git clone https://github.com/jadechoghari/VoiceRestore-demo
%cd VoiceRestore
!pip install -r requirements.txt
!python app.py

Usage - using Transformers 🤗

!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt

from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")

Example

Degraded Input:

Degraded Input Audio

Your browser does not support the audio element.

Restored (steps=32, cfg=1.0):

Your browser does not support the audio element.

Restored audio - 16 steps, strength 0.5:

Key Features

Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
Easy to Use: Simple interface for processing degraded audio files.
Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Model Details

Architecture: Flow-matching transformer
Parameters: 300M+ parameters
Input: Degraded speech audio (various formats supported)
Output: Restored speech

Limitations and Future Work

Current model is optimized for speech; may not perform optimally on music or other audio types.
Ongoing research to improve performance on extreme degradations.
Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@article{kirdey2024voicerestore,
  title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
  author={Kirdey, Stanislav},
  journal={arXiv},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Based on the E2-TTS implementation by Lucidrains
Special thanks to the open-source community for their invaluable contributions.
Credits: This repository is based on the E2-TTS implementation by Lucidrains

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
BigVGAN		BigVGAN
audio		audio
imgs		imgs
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.json		config.json
example-distort-16khz.wav		example-distort-16khz.wav
example-full-degrad.wav		example-full-degrad.wav
example-reverb-16khz.wav		example-reverb-16khz.wav
example_16khz.wav		example_16khz.wav
example_input.wav		example_input.wav
inference_long.py		inference_long.py
inference_short.py		inference_short.py
long_form_sample.ogg		long_form_sample.ogg
model.py		model.py
modeling.py		modeling.py
pytorch_model.bin		pytorch_model.bin
requirements.txt		requirements.txt
tensor_typing.py		tensor_typing.py
voice_restore.py		voice_restore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

Build - using Gradio 🟠

Usage - using Transformers 🤗

Example

Degraded Input:

Degraded Input Audio

Restored (steps=32, cfg=1.0):

Key Features

Model Details

Limitations and Future Work

Citation

License

Acknowledgments

About

Releases

Packages

Languages

License

jadechoghari/VoiceRestore-demo

Folders and files

Latest commit

History

Repository files navigation

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

Build - using Gradio 🟠

Usage - using Transformers 🤗

Example

Degraded Input:

Degraded Input Audio

Restored (steps=32, cfg=1.0):

Key Features

Model Details

Limitations and Future Work

Citation

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages