To training Audio Sep you shall install this dependences:
pip install tensorboard pyloudnorm
So to utilize your audio-text paired dataset:
- Format your dataset to match our JSON structure in
datafiles/template.json
, you have need of the pattern audio and the prompt of the pattern, like this:
{
"data": [
{
"wav": "path_to_audio_file",
"caption": "textual_desciptions"
}
]
}
- Update the
config/audiosep_base.yaml
file by listing your formatted JSON data files underdatafiles
if you used more.json
. For example:
data:
datafiles:
- 'datafiles/your_datafile_1.json'
- 'datafiles/your_datafile_2.json'
...
Train AudioSep from scratch:
python train.py --workspace workspace/AudioSep --config_yaml config/audiosep_base.yaml --resume_checkpoint_path checkpoint/ ''
Finetune AudioSep from pretrained checkpoint:
python train.py --workspace workspace/AudioSep --config_yaml config/audiosep_base.yaml --resume_checkpoint_path path_to_checkpoint
A folder workspace
will be create where you find the results of the training.