Kokoro-Align is a PyTorch speech-transcript alignment tool for LibriVox. It splits audio files in silent positions and find CTC best path to align transcript texts with the audio files. Kokoro-Align is used for Kokoro Speech Dataset.
- Not depend on non-commercially licensed datasets
CTC model predicts phonemes from MFCC audio features. You can download the pretrained model checkpoint and skip this process.
Get Kokoro Speech Dataset and extract
the data under ./data
.
./data/kokoro-speech-v1_2-small/metadata.csv
should be
the path of the transcript data.
Run this to preprocess Kokoro corpus.
$ python -m kokoro_align.prepare \
--dataset kokoro \
--data data/kokoro-speech-v1_2-small
This generates two files.
data/kokoro-text.npz
contains phonemes
and data/kokoro-audio.npz
contains MFCC features.
Run this to train the CTC model.
$ python -m kokoro_align.train \
--train --dataset kokoro --model-dir model/ctc
It achieve loss similar to this after 100 epochs.
train epoch 199: 100% 65/65 [00:20<00:00, 3.16it/s, loss=0.156]
train epoch 199: 100% 8/8 [00:00<00:00, 18.63it/s]
Avg loss: 0.306335
You can use the model trained in the above process or use the pretraine model
$ mkdir -p data
$ (cd data && curl -LO http://archive.org/download/gongitsune_um_librivox/gongitsune_um_librivox_64kb_mp3.zip)
$ unzip data/gongitsune_um_librivox_64kb_mp3.zip -d data/gongitsune-by-nankichi-niimi
$ ls data/gongitsune-by-nankichi-niimi/*.mp3 | sort > data/gongitsune_audio_files.txt
$ sed -e 's/\.mp3$/.plain.txt/' data/gongitsune_audio_files.txt > data/gongitsune_original_text_files.txt
You can see a shell script to download data by running
$ python run_example.py --download --dataset gongitsune-by-nankichi-niimi
$ python run_example.py
You can use output directory to make datasets with
Kokoro Speech Dataset
using output
direcotry.
$ python run_example.py --copy-index
- 明暗 (Meian) 16:39:29 Online text
- こころ (Kokoro) 08:46:41 Online text
- 雁 (Gan) 03:41:31 Online text
- 草枕 (Kusamakura) 04:27:35 Online text
- 田舎教師 (Inaka Kyoshi) 08:13:26 Online text
- 坊っちゃん (Botchan) 04:26:27 Online text
- 野分 (Nowaki) 4:40:49 Online text
- ごん狐 (Gon gitsune) 0:15:42 Online text
- コーカサスの禿鷹 (Caucasus no Hagetaka) 0:13:04 Online text