Pytorch unofficial implementation of VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Final project for SCC5830- Image Processing @ ICMC/USP.
For the task we intend to use the LibriSpeech dataset initially. However, to use it in this task, we need to generate audios with overlappings voices.
We use Si-SNR with PIT instead of Power Law compressed loss, because it allows us to achieve a better result ( comparison available in: https://github.com/Edresson/VoiceSplit).
We used the MISH activation function instead of ReLU and this has improved the result
You can see a report of what was done in this repository here
Colab notebooks Demos:
Exp 1: link
Exp 2: link
Exp 3: link
Exp 4: link
Exp 5 (best): link
Site demo for the experiment with best results (Exp 5): https://edresson.github.io/VoiceSplit/
Create documentation for the repository and remove unused code
- Train VoiceSplit model with GE2E3k and Mean Squared Error loss function
In this repository it contains codes of other collaborators, the due credits were given in the used functions:
Preprocessing: Eren Gölge @erogol
VoiceFilter Model: Seungwon Park @seungwonpark