Skip to content

Training Tips

Vega edited this page Sep 9, 2021 · 4 revisions

Process Dataset

aidatatang_200zh

After downloading, you need to unzip all voice files, try this in aidatatang_200zh/corpus/xxx/: cat *.tar.gz | tar zxvf - -i

Mozilla Common Voice

You HAVE TO rename Mozilla Common Voice to the below data structure

{dataset_name}/
     zh-TW/
          clips/
               xxx.mp3
          xxx.tsv

TODO: Currently only support train.tsv, will also support other data like test.tsv or validate.tsv in the future

TODO: Fix "no wordS" error when processing Mozilla Common Voice (cause only train.tsv for now)

Clone this wiki locally