content:
Comprehensive lebeled including title, scene name, role type, lyrics (ToDo), and so on.
With provided scripts, you can easily add more song to this dataset, or merge recording into existed song easily.
All the .wav files are ignored due to the size, to download the whole dataset, please go to this link: TODO
This data set is a combination of these intelligence contributions:
- Singing Voice Audio Dataset[1]
- Most audio source and label annotation are from this dataset.
- Jingju (Beijing opera) Phoneme Annotation[2]
- Use the csv file of this dataset to correct some annotaion errors in the former one.
- Add new audios from this dataset.
- about 2 new laosheng songs (one audio for each)
- about 6 new audios for exsiting laosheng songs
- about 4 new dan songs (one audio for each)
- about 2 new audio for exsiting dan songs
- Label corrections
- Adding yaml meta data file for each song by Shengxuan Wang (shawn120)
- Only keep opera data, removed all the non-opera data from previous dataset (e.g. modern songs).
- Future TODO: Add in more western opera to balance out the language inbalance.
[1] D. A. A. Black, M. Li, and M. Tian, “Automatic Identification of Emotional Cues in Chinese Opera Singing,” in 13th Int. Conf. on Music Perception and Cognition (ICMPC-2014), 2014, pp. 250–255.
[2] Rong Gong, Rafael Caro Repetto, & Yile Yang. (2017). Jingju a cappella singing dataset [Data set]. Zenodo. http://doi.org/10.5281/zenodo.344932
- Create yaml template
If you need to add more data, you might want to create a new yaml for it. You can use the following code to create a yaml template for it.
Argument:
AMOUNT_OF_WAV_FILES (optional): the amount of the wav files (the song_size entry) you want to initilize the yaml template with, default is 1.
python create_yaml_template.py AMOUNT_OF_WAV_FILES
-
add new recording
-
update song size
-
search information from yaml
- unify the data using
unify.py
- trim the data using
trim.py
includes pretrim (remove silence in the beginning) - generate csv (for the whole dataset) using
csv_gen.py
- make dataset in folds for cross validation:
- use
xvalid_split.py
to split data into folds and save them into local by running it - change the parameter in
xvalid_load.py
to choose from load folds from local or generate a new one, but DO NOT run this script.
- use
- load folds, train model, and evaluate in
torch_xvalid.py
emotion:
- emotion_1
- emotion_2
- emotion_3
emotion_binary: 1 or 0 (positive or negative), -1 represents to-be-labeled
files:
wav00:
file_dir: dir to this wav file
info:
bit_rate: bit_rate
channel_number: number of the channels
duration: duration
if_a_cappella: True or False (if it is a-cappella)
sample_rate: sample_rate
singer:
bio_gender: bio-gender of the person record this audio
id: singer_id
level: professional/intermediate/amateur
name: Singer name
wav01:
# ... ...
language: ch or we (chinese or western language)
lyric:
english: ''
original: ''
phonetic: ''
scene:
english: english translation for the scene title
original: original scene title (for chinese hanzi, will shown as unicode)
phonetic: only for chinese songs, so it will be pinyin for it
singing_type:
role: only for jingju, laosheng, dan, ... or TBD
singing: jingju/yuju/... or TBD
song_dir: dir to this song
song_id: id for this song
song_size: how many audio in this one single song
title:
english: english translation for the song title
original: original song title (for chinese hanzi, will shown as unicode)
phonetic: only for chinese songs, so it will be pinyin for it
wiki: notes or wiki for this song