-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding OverFlow #2183
Adding OverFlow #2183
Conversation
recipes/ljspeech/tacotron2-DDC/run-November-24-2022_01+19PM-c50d89fc/train_tacotron_ddc.py
Outdated
Show resolved
Hide resolved
recipes/ljspeech/tacotron2-DDC/run-November-24-2022_01+41PM-c50d89fc/train_tacotron_ddc.py
Outdated
Show resolved
Hide resolved
recipes/ljspeech/tacotron2-DDC/run-November-24-2022_02+10PM-c50d89fc/train_tacotron_ddc.py
Outdated
Show resolved
Hide resolved
recipes/ljspeech/tacotron2-DDC/run-November-24-2022_12+57PM-c50d89fc/train_tacotron_ddc.py
Outdated
Show resolved
Hide resolved
Good idea, from the samples I've hird the thing is quite good. What about adding neural hmm, or is it the same thing but just upgraded? |
It shares neural hmm as its core instead of attention. The benefits of neural HMM TTS are that it's almost half the number of parameters and it works very well even in a low resource setting i.e when we don't have enough data to train on. Once we merge this it would require very little change to add the neural HMM TTS into the system, which I plan to do as well. |
That's nice. How fast is it on the cpu? I previously suggested improving its speed for screen readers but I didn't realize how foolish that was untill recently. So how fast, or faster/slower is it compared to tacotron2 with hifigan or vits? |
Cool the PR is ready. I'll first try the LJSpeech recipe and let you know how it goes. |
# Process Autoregression | ||
h_memory, c_memory = self._process_ar_timestep(t, ar_inputs, h_memory, c_memory) | ||
# Get mean, std and transition vector from decoder for this timestep | ||
# Note: Gradient checkpointing currently doesn't works with multiple gpus inside a loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a blocker to use multi-gpu we should explain this in the model docstring and docs too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a model specific issue or rooted from torch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a torch issue gradient checkpointing in a loop is currently not supported in DDP. It works fine for Multi-GPU if we turn off the flag by use_grad_checkpointing=False
but will significantly increase memory usage while training. This is because to compute the actual data likelihood (not an approximation using MAS/Viterbi) we must use all the states at the previous time step during the forward pass to decide the probability mass at the current step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some information please take a look and see if it needs any more explanation.
Great PR @shivammehta25 Thanks for the contribution :). |
Oh my bad! I clicked one too many time the request for review button. Sorry for the spam xP. |
… and can be dumped as json
@shivammehta25 how do you compute |
Inside |
Ok thanks. I missed it for some reason. I also trained an LJSpeech model. It works great. But I think we also need to train a vocoder that we can do separately. Mergin it now 👍 |
I tried synthesising waveforms with the universal hifigan vocoder it works pretty well! |
This is the model from the paper: https://arxiv.org/abs/2211.06892
Audio samples: https://shivammehta25.github.io/OverFlow/