DDSP-QbE

This repository contains the source code for the paper Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example, accepted at Interspeech 2024.

Highlights

DDSP-QbE: An end-to-end any-to-any voice conversion (VC) method designed to preserve prosody and domain characteristics in speech anonymization, even for unseen speakers from non-standard data, including pathological (stuttering) and elderly populations.
Core Concepts:
- Query-by-Example (QbE)
- Differentiable Digital Signal Processing (DDSP)
Proposed Approach:
- Utilizes a subtractive harmonic oscillator-based DDSP synthesizer, inspired by the human speech production model, for effective learning with limited data.
- Introduces an inductive bias for prosody preservation by:
  - Employing a novel loss function that uses emotional speech to separate prosodic and linguistic features.
  - Adding supplementary hand-crafted and deep learning-generated input features with prosodic knowledge from the source utterance.
Generalizability: Demonstrated on the following benchmark datasets across different genders, emotions, ages, pathologies, and cross-corpus conversions:

Samples

Some samples can be found here. Due to privacy concerns, not all samples are available.

Demo

Here...

Pre-requisites

Python >= 3.11
Install the Python dependencies mentioned in requirements.txt:
```
pip install -r requirements.txt
```

Training (Documentation WIP)

Before Training

Download ESD dataset (audio and wavlm representations) from here, and place them at location as mentioned in config.yaml (train_path and val_path).
Configure all the training hyperparameters and paths in resources/config.yaml.
If using the emotion leakage-specific loss:
- Ensure all WavLM 6th layer embeddings for ESD files are available at the location specified in config.yaml under emotion_files_wavlm6_path.
- Set use_emo_loss to True in the config file.

Train

To start training, use the following command:

python train.py -g <gpu number> --config ./resources/config.yaml

Model Weights

The model weights are located in resources/model_weights.

References and Acknowledgements

Some portions of the code for this project have been adapted from the following repositories. A big thank you to the authors of:

Notes

In the original paper, the model was trained with ESD and data from Sep28-K and the ADReSS Challenge. However, we have provided an example training script using publicly available datasets due to privacy issues. The pathological datasets can be obtained upon request from their corresponding websites.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
fusion_synthesis		fusion_synthesis
phone_mapper		phone_mapper
preprocess		preprocess
resources		resources
samples		samples
wavlm		wavlm
README.md		README.md
dataloader.py		dataloader.py
ddsp-qbe.ipynb		ddsp-qbe.ipynb
ddsp-qbe.png		ddsp-qbe.png
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDSP-QbE

Highlights

Samples

Demo

Pre-requisites

Training (Documentation WIP)

Before Training

Train

Model Weights

References and Acknowledgements

Notes

About

Releases

Packages

Contributors 2

Languages

suhitaghosh10/ddsp-qbe

Folders and files

Latest commit

History

Repository files navigation

DDSP-QbE

Highlights

Samples

Demo

Pre-requisites

Training (Documentation WIP)

Before Training

Train

Model Weights

References and Acknowledgements

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages