-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparation Script for Training on Mozilla Commonvoice #111
base: main
Are you sure you want to change the base?
Conversation
…uages, but not sure how to otherwise check for individual languages downloaded and unpacked already instead rn
anyway you can limit this to english only? I tried this branch and it filled up my disk. |
For english only just limit the language list here to contain only "en": https://github.com/lifeiteng/vall-e/pull/111/files#diff-9c086567a8bee92cd4ae661ae5d75be66ae5340f8982cb287a09a78aee2041bdR22 |
Thanks! I will give it a try |
@@ -0,0 +1 @@ | |||
甚至 出现 交易 几乎 停滞 的 情况 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update prompts
Inorder to get reasonable result, we need design the multi-language Symbol set, work with Language ID. |
cutsDevList+="${audio_feats_dir}/commonvoice_cuts_${lang}_dev_subset.jsonl.gz " | ||
cutsTestList+="${audio_feats_dir}/commonvoice_cuts_${lang}_test_subset.jsonl.gz " | ||
done | ||
# echo "${cutsTrainList}" # debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean comments
@RuntimeRacer please update prompts. Can you share the results here? |
@lifeiteng I just started training the NAR model today; I will share results in a bit once first few epochs have been completed. Will also update comments then. |
@RuntimeRacer - any updates on the training performance? |
@pawel-polyai It's currently training NAR Epoch 4 on 6x RTX 3090, after training 10 Epochs for AR; Intermediate Results are mediocre so far; it is able to Synthesize Speech (Tested only English and German), however it is still not able to fully maintain Speaker Identity nor accent. For Example, after NAR Epoch 1 it spoke with seemingly slavic accent; After 2 and 3 it Changed to French for some reason; so not sure how precise it can get yet, nor if the Accent it is speaking with is coming from an attempted transfer of Speaker Identity, randomly based on last trained Training data, or Dataset Bias. However The Loss is still decreasing for NAR and I'll keep you updated. Sharing Traning graphs here as well: Also sharing my (very not-in-depth) Examples; Only tested with one speaker which I found TTS Models to have a hard time replicating in the past; and also the intermediate models NAR Epoch 1-3: https://drive.google.com/drive/folders/1-bCwvXdXd4O2NOBigoXVdArAnZoigvWc?usp=sharing If you want to play around with it yourself, you can perform inference with these commands:
|
Thank you for your detailed and valuable share. |
I train this model on 20 different languages. So I believe it has issues handling some of these, or the dialects of a certain subset of the data. I believe it is still improving in accuracy though. |
VALL-E(this repo) focus on single language, in order to support mulit-lang, we should design the some experiments and verify them. If this PR can get reasonable results, I'm OK to approve it. @chenjiasheng we can synthesize some audio to judge the effectiveness of the model & data pipeline. |
Yes it is still in the process of converging; I believe even the 10 Epoch AR Model was way from being fully converged; so it might be worth the effort to do another Follow-up training on Stage 1 / AR model. @lifeiteng Do you think I can just continue AR training independently later despite NAR model has already been trained; and after like ~10 more Epochs AR (which would be 20 in total) do another 10 Epochs on NAR for fine-tune? So current Iteration would be 10 AR / 10 NAR and later 20 AR / 20 NAR for example |
@RuntimeRacer NAR need more epochs than AR. You can switch to train AR. |
@lifeiteng I am confused now. I tried to restart AR Training from the checkpoint I already had trained NAR on. Used this command: It also said it loaded from the existing But now after a few hours I checked Graphs and outputs, and it kinda started completely from scratch now: All the checkpoints contain only AR weights according to size, and started off from step 0. And well, NAR weights seem to be completely erased in the new checkpoints according to size. |
@RuntimeRacer @lifeiteng |
@RuntimeRacer @chenjiasheng Yes, we can do better! There exists a PR is welcome! I didn't have time to hands on it now. |
This PR provides an end-to-end preparation script for Mozilla CommonVoice.
I built it by copying over the Scripts from AIShell and combining it with the preparation scripts for commonvoice found in Icefall which is also using Lhotse. References:
Some additional Info and stats:
max-duration
80 being too high for this dataset (running on RTX 3090 24GB).Since I did not finish training yet, I cannot provide any sample models, results or stats at this point.