This repository provides an unofficial implementation of speech restoration model Miipher. Miipher is originally proposed by Koizumi et. al. arxiv Please note that the model provided in this repository doesn't represent the performance of the original model proposed by Koizumi et. al. as this implementation differs in many ways from the paper.
Install with pip. The installation is confirmed on Python 3.10.11
pip install git+https://github.com/CShulby/miipher
The pretrained model is trained on LibriTTS-R and JVS corpus, and provided in CC-BY-NC-2.0 license.
python run_miipher.py
You can also run in parallel on CPU by running the following script and passing a list of the wav files (note they should have corresponding transcriptions in the same folder):
python run_miipher_parallel.py --wav_list wav_list
If you are still hungry for more you can run the same way using full GPU inference:
python run_miipher_gpu.py --wav_list wav_list
Tests on an RTX 4090 showed a difference of 3.5x real time with the parallel CPU script vs. 30x real time on GPU
original paper | This repo | |
---|---|---|
Clean speech dataset | proprietary | LibriTTS-R and JVS corpus |
Noise dataset | TAU Urban Audio-Visual Scenes 2021 dataset | TAU Urban Audio-Visual Scenes 2021 dataset and Slakh2100 |
Speech SSL model | W2v-BERT XL | WavLM-large |
Language SSL model | PnG BERT | XPhoneBERT |
Feature cleaner building block | DF-Conformer | Conformer |
Vocoder | [WaveFit]https://arxiv.org/abs/2210.01029) | HiFi-GAN |
X-Vector model | Streaming Conformer-based speaker encoding model | speechbrain/spkrec-xvect-voxceleb |
Code in this repo: MIT License
Weights on huggingface: CC-BY-NC-2.0 license