-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training HiFiGan -- avg loss not decreasing #1003
Comments
Forgot to add: There's no alignment section on the Tensorboards. |
You don't need alignments with HifiGAN. There is no attention or duration prediction. |
Hello! I was running HifiGAN on LJSpeech with default config and having the same issue. |
This might be an actual bug, I’ve tried training all hifigan with v1 and v2 configurations and I can get to have a smooth for the couple thousands steps but it will always flatten out and stop learning
…Sent from my iPhone
On Dec 21, 2021, at 3:03 AM, Ilyohji ***@***.***> wrote:
You don't need alignments with HifiGAN. There is no attention or duration prediction.
Hello! I was running HifiGAN on LJSpeech with default config and having the same issue.
Could you please clarify how to fix it?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
|
@erogol Also having the same problem, the model is not fitting. Was running config from 'recipies/ljspeech/hifigan/train_hifigan.py' for LJSpeech dataset. What could cause such problem? |
I'm aware of the problem and will check when I have time for it. |
Can someone upload the TB to https://tensorboard.dev/ and share? |
what is the same issue? |
https://tensorboard.dev/experiment/9pdI9rvbSYO4eMCWLH1Tqg/#scalars |
Your config is the same as my config. only diff is the file paths. I suspect it is about the system setup. Can you do this ? wget https://raw.githubusercontent.com/coqui-ai/TTS/main/TTS/bin/collect_env_details.py
python collect_env_details.py and post here |
@skol101 did you try with a single GPU? |
I get 404:not found error after running 'wget ..'. |
try this
|
Is it normal for the model to stop improving after 10k steps. I've messed with the hyper params and every time the eval audio and the spectrograms just stay the same as expected when the loss doesn't change. I'm just curious if this is normal behavior. |
|
Here is the resulting wav that I get after using the whole model for generating speech: Tacotron-2 DDC + HiFiGAN. Tacotron-2 DDC was trained for 120k steps and reproduces good results with Griffin Lim, so the problem seems to be with the vocoder result_1.mp4 |
hello we also trained hifigan recently with our own dataset and around similar steps added to glow tts and also result with the same sound. |
There are multiple hifigan models,
|
I realized that the learning rate attenuated too quickly. Maybe this is why the models stop learning. I'll update the recipe in the dev branch. But you can try setting |
0.5 fixed my issue I was able to train hifgan just fine can probably close this |
hi what do you mean 0.5? could you clarify? |
newest TTS version, ran the ljspeech recipe on hifigan and trained a vctk model and both sounded pretty good
…Sent from my iPhone
On Jan 7, 2022, at 12:43 AM, michaellin99999 ***@***.***> wrote:
I realized that the learning rate attenuated too quickly. Maybe this is why the models stop learning. I'll update the recipe in the dev branch.
But you can try setting scheduler_after_epoch = True in your own config to try it out yourself.
0.5 fixed my issue I was able to train hifgan just fine can probably close this
hi what do you mean 0.5? could you clarify?
thanks
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Cheers, guys! |
Describe the bug
Running for 240k steps no improvement is avg loss when training HiFiGan.
To Reproduce
Steps to reproduce the behavior:
--> STEP: 24/352 -- GLOBAL_STEP: 244300
| > G_l1_spec_loss: 0.36788 (0.35471)
| > G_gen_loss: 16.55468 (15.96176)
| > G_adv_loss: 0.00000 (0.00000)
| > loss_0: 16.55468 (15.96176)
| > grad_norm_0: 22.85464 (28.87626)
| > current_lr_0: 7.0524350586068e-111
| > current_lr_1: 0.00010
| > step_time: 0.29070 (0.29190)
| > loader_time: 0.00150 (0.00135)
3. Evaluation:
--> EVAL PERFORMANCE
| > avg_loader_time: 0.00034 (-0.00003)
| > avg_G_l1_spec_loss: 0.35621 (+0.00000)
| > avg_G_gen_loss: 16.02957 (+0.00000)
| > avg_G_adv_loss: 0.00000 (+0.00000)
| > avg_loss_0: 16.02957 (+0.00000)
Expected behavior
Improvement in loss during training.
Environment (please complete the following information):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
PyTorch or TensorFlow version (use command below): pytorch 1.10.0
Python version: 3.8.11
CUDA/cuDNN version: py3.8_cuda11.3_cudnn8.2.0_0
GPU model and memory: 2xRTX 3090
Additional context
Add any other context about the problem here.
Script also generates config.json in the dir where train_hifigan.py resides as well as in the generated run dir.
The text was updated successfully, but these errors were encountered: