Speaker Voice Separation using Neural Nets

Installation

git clone https://github.com/Muhammad-Ahmad-Ghani/svoice_demo.git
cd svoice_demo
conda create -n svoice python=3.7 -y
conda activate svoice
# CUDA 11.3
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
# CPU only
pip install torch==1.12.0+cpu torchvision==0.13.0+cpu torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

Pretrained-Model	Dataset	Epochs	Train Loss	Valid Loss
checkpoint.th	Librimix-7 (16k-mix_clean)	31	0.04	0.64

This is an intermediate checkpoint just for demo purpose.

create directory outputs/exp_ and save checkpoint there

svoice_demo
├── outputs
│   └── exp_
│       └── checkpoint.th
...

Run Gradio Demo

conda activate svoice
python demo.py

Training

Create dataset mix_clean with sample rate 16K using librimix repo.

Dataset Structure

svoice_demo
├── Libri{NUM_OF_SPEAKERS}Mix_Dataset -> Libri7Mix_Dataset
│   └── wav{SAMPLE_RATE_VALUE}k -> wav16k
│       └── min
│       │   └── dev
│       │       └── ...
│       │   └── test
│       │       └── ...
│       │   └── train-360
│       │       └── ...
...

Create `metadata` files

Run predefined scripts if you want.

# for 7 speakers
bash create_metadata_librimix7.sh
# for 10 speakers
bash create_metadata_librimix10.sh

Change conf/config.yaml according to your settings. Set C: NUM_OF_SPEAKERS value at line 66 for number of speakers.

python train.py

This will automaticlly read all the configurations from the conf/config.yaml file. To know more about the training you may refer to original svoice repo.

Distributed Training

python train.py ddp=1

Evaluating

python -m svoice.evaluate <path to the model> <path to folder containing mix.json and all target separated channels json files s<ID>.json>

Citation

The svoice code is borrowed from original svoice repository. All rights of code are reserved by META Research.

@inproceedings{nachmani2020voice,
  title={Voice Separation with an Unknown Number of Multiple Speakers},
  author={Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
  booktitle={Proceedings of the 37th international conference on Machine learning},
  year={2020}
}

@misc{cosentino2020librimix,
    title={LibriMix: An Open-Source Dataset for Generalizable Speech Separation},
    author={Joris Cosentino and Manuel Pariente and Samuele Cornell and Antoine Deleforge and Emmanuel Vincent},
    year={2020},
    eprint={2005.11262},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

License

This repository is released under the CC-BY-NC-SA 4.0. license as found in the LICENSE file.

The file: svoice/models/sisnr_loss.py and svoice/data/preprocess.py were adapted from the kaituoxu/Conv-TasNet repository. It is an unofficial implementation of the Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation paper, released under the MIT License. Additionally, several input manipulation functions were borrowed and modified from the yluo42/TAC repository, released under the CC BY-NC-SA 3.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speaker Voice Separation using Neural Nets

Installation

Run Gradio Demo

Training

Create `metadata` files

Distributed Training

Evaluating

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speaker Voice Separation using Neural Nets

Installation

Run Gradio Demo

Training

Create metadata files

Distributed Training

Evaluating

Citation

License

Create `metadata` files