Skip to content

Data augmentation and gender control (usage and pretrained model)

Compare
Choose a tag to compare
@yqzhishen yqzhishen released this 15 Feb 07:29
· 83 commits to refactor since this release
eecb002

Overview

In this release, we introduced data augmentation to DiffSinger in this forked repository.

See the dataset making pipeline for more details.

Random pitch shifting

Randomly shifts pitch of training data and embeds how many semitones the pitch is shifted into the neural networks. This broadens the pitch range and allows you to control the gender (like GEN parameter in VOCALOID) at frame level.

To enable random pitch shifting for your former dataset, add the following configuration in the config file:

augmentation_args:
  random_pitch_shifting:
    range: [-5., 5.]
    scale: 2.0
use_key_shift_embed: true

Fixed pitch shifting

Shifts pitch of the training data for several semitones. All data with pitch shifting is regarded to be from other speakers than the original speaker. Speaker embedding is enabled and the number of speakers is increased, and the pitch range is also broadened.

To enable fixed pitch shifting for your former dataset, add the following configuration in the config file:

augmentation_args:
  fixed_pitch_shifting:
    targets: [-5., 5.]
    scale: 0.75
use_key_shift_embed: false
use_spk_id: true
num_spk: X # Set this value to at least (1 + T) * N, where T is the number of targets and N is the number of speakers before augmentation.

0211_opencpop_ds1000_keyshift

The pretrained model on the opencpop dataset and applied with randomly pitch shifting.

Control gender value with CLI args of main.py:

python main.py xxx.ds --exp 0211_opencpop_ds1000_keyshift --gender GEN

where GEN is a float value between -1 and 1 (negative = male, positive = female).

Control gender curve in *.ds files:

{
  "gender_timestep": "0.005", // timestep in seconds, like f0_timestep
  "gender": "-1.0 -0.9 -0.8 ... 0.8 0.9 1.0", // sequence of float values, like f0_seq
  ... // other attributes
}

Export to ONNX format

python onnx/export/export_acoustic.py --exp 0211_opencpop_ds1000_keyshift --expose_gender

or

python onnx/export/export_acoustic.py --exp 0211_opencpop_ds1000_keyshift [--freeze_gender GEN]

where GEN is the gender value that you would like to freeze into the model (defaults to 0).