Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible BUG may make your code not achieve the best performance. (background noise, etc.) #131

Closed
dathudeptrai opened this issue Apr 24, 2020 · 9 comments · Fixed by #132
Labels
bug Something isn't working

Comments

@dathudeptrai
Copy link
Contributor

dathudeptrai commented Apr 24, 2020

Hi @kan-bayashi , there is a bug i think that make ur code cann't achieve the best performance for both melgan and PWG. That is after training 1 step generator, you should re-compute y_ then use this y_ for discriminator, but seem ur code not do that. In my experiment, re-compute y_ is crucial for obtain best quality. I have my tensorflow code for melgan, i can get the same performance as ur code but just around 2M steps from scratch (don't need PWG auxiliary loss to help convergence speed), but if i don't recompute y_, my tf code at 2M steps not good as 2M steps when recompute y_

@kan-bayashi kan-bayashi added the bug Something isn't working label Apr 24, 2020
@kan-bayashi
Copy link
Owner

Thank you for your suggestions!
I'm not sure which is the standard for GAN training, but it is better to change the update manner since your experiments shows better performance.
I will make PR.

@kan-bayashi
Copy link
Owner

BTW, how was the final quality? Is it improved? or just related to convergence speed?

@dathudeptrai
Copy link
Contributor Author

dathudeptrai commented Apr 24, 2020

I think that make sense, discriminator should use newest y_ for training, that make discriminator penalize generator better. U can ref two implementation on github (https://github.com/seungwonpark/melgan/blob/master/utils/train.py/#L85) and official code (https://github.com/descriptinc/melgan-neurips/blob/master/scripts/train.py/#L156). Note that official code training discriminator before generator but it's still re-compute D-fake for training generator. The quality on my language (vietnamese) is improve significantly. I'm training Ljspeech now on melgan only, i have 1M4 val audio here. I believe that the same improvement can achieve with PWG since the manner correctly.
1460000steps.zip

@kan-bayashi
Copy link
Owner

Thank you for sharing samples!
It seems very good :) I will follow your suggested manner!

@dathudeptrai
Copy link
Contributor Author

dathudeptrai commented Apr 24, 2020

BTW, i'm creating TF framework for Speech, now focus TTS :)), just like ur ESPNet :D. TF now training faster than pytorch with exact parameter, batch_size, batch_max_step, the improvement on inference speed you knew before: D.

@kan-bayashi
Copy link
Owner

That sounds nice :)
But I love pytorch so I want facebook to make it faster 🚀

@dathudeptrai
Copy link
Contributor Author

dathudeptrai commented Apr 24, 2020

haha BTW, what is the license of this repo and ur ESPNET :)), i cann't retrain anything, there are many model and dataset, i plan to provide some training script, pretrained on my TF framework and the rest is convert from ur pytorch pretrained then support for inference only :D. Can i do that ?

@kan-bayashi
Copy link
Owner

That's very cool.
This repository is MIT license and ESPnet is Apache 2.0, so you can do it!

@dathudeptrai
Copy link
Contributor Author

ok that nice, i will make TF models and pytorch models synchronized and use the same preprocesing here so atleast user can use pytorch for training and TF for inference :D. Again, thank for ur hard work :)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants