This repository contains the source code for the paper Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example, accepted at Interspeech 2024.
- DDSP-QbE: An end-to-end any-to-any voice conversion (VC) method designed to preserve prosody and domain characteristics in speech anonymization, even for unseen speakers from non-standard data, including pathological (stuttering) and elderly populations.
- Core Concepts:
- Query-by-Example (QbE)
- Differentiable Digital Signal Processing (DDSP)
- Proposed Approach:
- Utilizes a subtractive harmonic oscillator-based DDSP synthesizer, inspired by the human speech production model, for effective learning with limited data.
- Introduces an inductive bias for prosody preservation by:
- Employing a novel loss function that uses emotional speech to separate prosodic and linguistic features.
- Adding supplementary hand-crafted and deep learning-generated input features with prosodic knowledge from the source utterance.
- Generalizability: Demonstrated on the following benchmark datasets across different genders, emotions, ages, pathologies, and cross-corpus conversions:
Some samples can be found here. Due to privacy concerns, not all samples are available.
-
Python >= 3.11
-
Install the Python dependencies mentioned in
requirements.txt
:pip install -r requirements.txt
- Download ESD dataset (audio and wavlm representations) from here, and place them at location as mentioned in
config.yaml
(train_path
andval_path
). - Configure all the training hyperparameters and paths in
resources/config.yaml
. - If using the emotion leakage-specific loss:
- Ensure all WavLM 6th layer embeddings for ESD files are available at the location specified in
config.yaml
underemotion_files_wavlm6_path
. - Set
use_emo_loss
toTrue
in the config file.
- Ensure all WavLM 6th layer embeddings for ESD files are available at the location specified in
To start training, use the following command:
python train.py -g <gpu number> --config ./resources/config.yaml
The model weights are located in resources/model_weights
.
Some portions of the code for this project have been adapted from the following repositories. A big thank you to the authors of:
In the original paper, the model was trained with ESD and data from Sep28-K and the ADReSS Challenge. However, we have provided an example training script using publicly available datasets due to privacy issues. The pathological datasets can be obtained upon request from their corresponding websites.