Kokoro-Align

Kokoro-Align is a PyTorch speech-transcript alignment tool for LibriVox. It splits audio files in silent positions and find CTC best path to align transcript texts with the audio files. Kokoro-Align is used for Kokoro Speech Dataset.

Objectives

Not depend on non-commercially licensed datasets

How to train CTC model

CTC model predicts phonemes from MFCC audio features. You can download the pretrained model checkpoint and skip this process.

Download data

Get Kokoro Speech Dataset and extract the data under ./data. ./data/kokoro-speech-v1_2-small/metadata.csv should be the path of the transcript data.

Preprocessing

Run this to preprocess Kokoro corpus.

$ python -m kokoro_align.prepare \
    --dataset kokoro \
    --data data/kokoro-speech-v1_2-small

This generates two files. data/kokoro-text.npz contains phonemes and data/kokoro-audio.npz contains MFCC features.

Run training

Run this to train the CTC model.

$ python -m kokoro_align.train \
    --train --dataset kokoro --model-dir model/ctc

It achieve loss similar to this after 100 epochs.

train epoch 199: 100% 65/65 [00:20<00:00,  3.16it/s, loss=0.156]
train epoch 199: 100% 8/8 [00:00<00:00, 18.63it/s]
Avg loss: 0.306335

How to build Kokoro-Speech-Dataset

You can use the model trained in the above process or use the pretraine model

Download audio data

$ mkdir -p data
$ (cd data && curl -LO http://archive.org/download/gongitsune_um_librivox/gongitsune_um_librivox_64kb_mp3.zip)
$ unzip data/gongitsune_um_librivox_64kb_mp3.zip -d data/gongitsune-by-nankichi-niimi
$ ls data/gongitsune-by-nankichi-niimi/*.mp3 | sort > data/gongitsune_audio_files.txt
$ sed -e 's/\.mp3$/.plain.txt/' data/gongitsune_audio_files.txt > data/gongitsune_original_text_files.txt

You can see a shell script to download data by running

$ python run_example.py --download --dataset gongitsune-by-nankichi-niimi

Make metadata

$ python run_example.py

Copy index

You can use output directory to make datasets with Kokoro Speech Dataset using output direcotry.

$ python run_example.py --copy-index

Dataset

明暗 (Meian) 16:39:29 Online text
こころ (Kokoro) 08:46:41 Online text
雁 (Gan) 03:41:31 Online text
草枕 (Kusamakura) 04:27:35 Online text
田舎教師 (Inaka Kyoshi) 08:13:26 Online text
坊っちゃん (Botchan) 04:26:27 Online text
野分 (Nowaki) 4:40:49 Online text
ごん狐 (Gon gitsune) 0:15:42 Online text
コーカサスの禿鷹 (Caucasus no Hagetaka) 0:13:04 Online text

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
kokoro_align		kokoro_align
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.json		example.json
run_example.py		run_example.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro-Align

Objectives

How to train CTC model

Download data

Preprocessing

Run training

How to build Kokoro-Speech-Dataset

Download audio data

Make metadata

Copy index

Dataset

About

Releases

Packages

Languages

License

kaiidams/Kokoro-Align

Folders and files

Latest commit

History

Repository files navigation

Kokoro-Align

Objectives

How to train CTC model

Download data

Preprocessing

Run training

How to build Kokoro-Speech-Dataset

Download audio data

Make metadata

Copy index

Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages