Training recipes for thorsten dataset #1020

noranraskin · 2021-12-15T16:12:55Z

Basically just updated the recipes from https://github.com/coqui-ai/TTS-recipes to the new coqpit version of writing training scripts.

CLAassistant · 2021-12-15T16:13:00Z

All committers have signed the CLA.

thorstenMueller · 2021-12-15T20:59:34Z

Hi @noranraskin,
thanks for this PR. I'm not sure if @erogol accepts PRs directly to main instead of dev, but we'll see.

Just as side note:
My new neutral dataset (with more natural speech flow) is growing and could be released on early 2022.

erogol · 2021-12-20T09:45:14Z

thanks for the ✨PR✨ @noranraskin

erogol · 2021-12-20T09:47:35Z

How about using the new dataset downloader instead of the bash script?

TTS/TTS/utils/downloaders.py

Line 90 in 3780346

def download_thorsten_de(path: str):

stale · 2022-01-19T10:20:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

thorstenMueller · 2022-01-20T17:28:18Z

Hi @noranraskin, what do you think about @erogol s suggestion?

erogol · 2022-02-11T12:30:06Z

@noranraskin are there more recipes 😄 on the way or should I just merge away?

noranraskin · 2022-02-13T20:08:25Z

@erogol there's more recipes on the way. But I'm currently getting an error additional parameter 'ignored_speakers:[] given I also get this on the previously tacotron2-DDC script and all other model recipes, vocoders work fine.
I pulled the latest changes from main a few days ago, any ideas?

erogol · 2022-03-06T13:27:49Z

Can you try the latest dev? (sorry for the delayed response)

noranraskin · 2022-03-08T17:43:07Z

@erogol **kwargs was missing in the thorsten formatter. I fixed that issue. Now I'm getting

Traceback (most recent call last):
  File "train_tacotron_ddc.py", line 71, in <module>
    train_samples, eval_samples = load_tts_samples(dataset_config, eval_split=True)
  File "/data/nra/tts-finetuning/TTS/TTS/tts/datasets/__init__.py", line 112, in load_tts_samples
    meta_data_train = [{**item, **{"language": language}} for item in meta_data_train]
  File "/data/nra/tts-finetuning/TTS/TTS/tts/datasets/__init__.py", line 112, in <listcomp>
    meta_data_train = [{**item, **{"language": language}} for item in meta_data_train]
TypeError: 'list' object is not a mapping

On latest main and dev. I'm not getting this error when I try LJspeech. I think this is because some problem with the formatter or data importer, as this dataset only has a single speaker.

erogol · 2022-03-10T09:38:56Z

New formatters return dictionaries not lists.

a-froghyar · 2022-04-28T09:54:28Z

Hey, just chipping in with a question here, these recipes won't work without a german text cleaner implemented first, no?

erogol · 2022-05-07T11:56:23Z

recipes/fokus/.gitignore

@@ -0,0 +1 @@
+arabic-speech-corpus


I don't think we need this in 🐸TTS, as you can simply move it to somewhere else in your system.

Oh, sorry my mistake

erogol · 2022-05-12T11:09:28Z

@noranraskin looks good to me. I can merge it when you are ready :)

noranraskin · 2022-05-18T00:25:59Z

I was still running in some torch errors. I'm updating and testing all the recipes again to make sure everything is working. Might also add a couple more...

noranraskin · 2022-05-18T14:18:52Z

@erogol so I only got tacotron2_DDC to work.
GlowTTS throws a really long stacktrace with some nvrtc compilation of "default_program" failing.
And all the vocoders through the same cuFFT error at epoch 0.
This could have something to do with my installation, since I'm running a nightly build of torch, as I couldn't downgrade cuda on my system.
All the recipes are copied from ljspeech so I'm running into the same issues there too. Maybe somebody else can check them and report back (haven't opened an issue yet because of the nature of my installation)

Plus:
I added a dataset check at the beginning of all the recipes, that downloads the dataset if it can't be found

loganhart02 · 2022-05-19T15:16:48Z

@erogol so I only got tacotron2_DDC to work. GlowTTS throws a really long stacktrace with some nvrtc compilation of "default_program" failing. And all the vocoders through the same cuFFT error at epoch 0. This could have something to do with my installation, since I'm running a nightly build of torch, as I couldn't downgrade cuda on my system. All the recipes are copied from ljspeech so I'm running into the same issues there too. Maybe somebody else can check them and report back (haven't opened an issue yet because of the nature of my installation)

Plus: I added a dataset check at the beginning of all the recipes, that downloads the dataset if it can't be found

I was able to train glow-tts and hifigan with the recipes past epoch 0. what cuda are you using and have you tried running it on a conda environment with a different cuda?

noranraskin · 2022-05-20T15:24:16Z

I was running CUDA 11.6 with the compatible pytorch nightly build. But I realised, I was always fetching from 'main' so working with pretty old code. I fixed my installation and tested everything again and now it's working for me too

noranraskin · 2022-05-20T15:32:46Z

The new recipes for align_tts, vits, wavegrad and wavernn are tested and working.
I'm still getting a RuntimeError for speedy_speech:
Calculated padded input size per channel: (7). Kernel size: (13). Kernel size can't be greater than actual input size.

If you don't know a fix right away I'd just discard this one for now.
Also I'd be ready to merge now, don't have anything to add for anymore.

thorstenMueller · 2022-05-20T19:50:26Z

Hi @noranraskin ,
first of all thanks for your efforts to add training recipes for my Thorsten dataset 👏.

When the new Thorsten models trained by @domcross and me are released (soon to happen) our next step is the preparation of the dataset release which is the base of the new models. Maybe we can add the recipes to your existing structure then 😊.

Here's a comparison of a model based on the current and the new dataset.
https://www.thorsten-voice.de/2022/03/20/vergleich-thorsten-aktuell-mit-dem-neuen-modell/

loganhart02 · 2022-05-23T11:45:11Z

The new recipes for align_tts, vits, wavegrad and wavernn are tested and working. I'm still getting a RuntimeError for speedy_speech: Calculated padded input size per channel: (7). Kernel size: (13). Kernel size can't be greater than actual input size.

If you don't know a fix right away I'd just discard this one for now. Also I'd be ready to merge now, don't have anything to add for anymore.

Ill see if I can fix the speedy_speech error real quick and then merge :)

erogol changed the base branch from main to dev December 20, 2021 09:44

stale bot added the wontfix This will not be worked on but feel free to help. label Jan 19, 2022

stale bot removed the wontfix This will not be worked on but feel free to help. label Jan 20, 2022

noranraskin closed this Mar 8, 2022

noranraskin reopened this Mar 8, 2022

erogol reviewed May 7, 2022

View reviewed changes

erogol assigned loganhart02 May 19, 2022

loganhart02 and others added 3 commits May 30, 2022 11:24

Fix style

1a3020a

Fix isort

80c325c

Remove tensorboardX from requirements

23f8a44

erogol merged commit a790df4 into coqui-ai:dev May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training recipes for thorsten dataset #1020

Training recipes for thorsten dataset #1020

noranraskin commented Dec 15, 2021

CLAassistant commented Dec 15, 2021 •

edited

Loading

thorstenMueller commented Dec 15, 2021

erogol commented Dec 20, 2021

erogol commented Dec 20, 2021

stale bot commented Jan 19, 2022

thorstenMueller commented Jan 20, 2022

erogol commented Feb 11, 2022

noranraskin commented Feb 13, 2022 •

edited

Loading

erogol commented Mar 6, 2022

noranraskin commented Mar 8, 2022

erogol commented Mar 10, 2022

a-froghyar commented Apr 28, 2022

erogol May 7, 2022

noranraskin May 10, 2022

erogol commented May 12, 2022

noranraskin commented May 18, 2022

noranraskin commented May 18, 2022 •

edited

Loading

loganhart02 commented May 19, 2022 •

edited

Loading

noranraskin commented May 20, 2022

noranraskin commented May 20, 2022

thorstenMueller commented May 20, 2022

loganhart02 commented May 23, 2022

		@@ -0,0 +1 @@
		arabic-speech-corpus

Training recipes for thorsten dataset #1020

Training recipes for thorsten dataset #1020

Conversation

noranraskin commented Dec 15, 2021

CLAassistant commented Dec 15, 2021 • edited Loading

thorstenMueller commented Dec 15, 2021

erogol commented Dec 20, 2021

erogol commented Dec 20, 2021

stale bot commented Jan 19, 2022

thorstenMueller commented Jan 20, 2022

erogol commented Feb 11, 2022

noranraskin commented Feb 13, 2022 • edited Loading

erogol commented Mar 6, 2022

noranraskin commented Mar 8, 2022

erogol commented Mar 10, 2022

a-froghyar commented Apr 28, 2022

erogol May 7, 2022

Choose a reason for hiding this comment

noranraskin May 10, 2022

Choose a reason for hiding this comment

erogol commented May 12, 2022

noranraskin commented May 18, 2022

noranraskin commented May 18, 2022 • edited Loading

loganhart02 commented May 19, 2022 • edited Loading

noranraskin commented May 20, 2022

noranraskin commented May 20, 2022

thorstenMueller commented May 20, 2022

loganhart02 commented May 23, 2022

CLAassistant commented Dec 15, 2021 •

edited

Loading

noranraskin commented Feb 13, 2022 •

edited

Loading

noranraskin commented May 18, 2022 •

edited

Loading

loganhart02 commented May 19, 2022 •

edited

Loading