-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTS] Add config and modules for 22khz and 44khz audio codec #10107
Conversation
@@ -0,0 +1,194 @@ | |||
# This config contains the default values for training 44.1kHz audio codec model which encodes mel spectrogram | |||
# instead of raw audio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mel spectrogram codec or raw audio codec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed description to raw audio codec.
@@ -15,7 +15,7 @@ | |||
import torch | |||
import torch.nn as nn | |||
|
|||
__all__ = ['Swish', 'Snake'] | |||
__all__ = ['Swish', 'Snake', 'HalfSnake'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should move Snake and HalfSnake to audio collection instead of asr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move the whole thing to common
, so we can import it from anywhere without ending up with circular imports, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add it to common, should it be in the activation_registry
? The registry only seems useful if the activation does not require input arguments.
Otherwise, it seems roundabout to have something like:
if activation in ["snake", "half_snake"]:
self.activation = activation_registry[activation](channels)
else:
self.activation = activation_registry[activation]()
@@ -322,6 +324,152 @@ def forward(self, audio_real, audio_gen): | |||
return scores_real, scores_gen, fmaps_real, fmaps_gen | |||
|
|||
|
|||
class DiscriminatorSTFT(NeuralModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add info of arguments. They are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
return scores, fmap | ||
|
||
|
||
class MultiBandDiscriminatorSTFT(NeuralModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add information on arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -322,6 +324,152 @@ def forward(self, audio_real, audio_gen): | |||
return scores_real, scores_gen, fmaps_real, fmaps_gen | |||
|
|||
|
|||
class DiscriminatorSTFT(NeuralModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add info of arguments. They are missing.
return scores, fmap | ||
|
||
|
||
class MultiBandDiscriminatorSTFT(NeuralModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add information on arguments.
@@ -868,6 +1028,108 @@ def forward(self, inputs, input_len): | |||
return out | |||
|
|||
|
|||
class HiFiGANEncoder(NeuralModule): | |||
""" | |||
Encoder created by inverting the HiFi-GAN decoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add information on arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: rlangman <rlangman@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
…10107) * [TTS] Add config and modules for 22khz and 44khz audio codec Signed-off-by: Ryan <rlangman@nvidia.com> * Apply isort and black reformatting Signed-off-by: rlangman <rlangman@users.noreply.github.com> * [TTS] Add argument docstring to new modules Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: rlangman <rlangman@users.noreply.github.com> Co-authored-by: rlangman <rlangman@users.noreply.github.com> Signed-off-by: adityavavre <aditya.vavre@gmail.com>
* [TTS] Add config and modules for 22khz and 44khz audio codec Signed-off-by: Ryan <rlangman@nvidia.com> * Apply isort and black reformatting Signed-off-by: rlangman <rlangman@users.noreply.github.com> * [TTS] Add argument docstring to new modules Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: rlangman <rlangman@users.noreply.github.com> Co-authored-by: rlangman <rlangman@users.noreply.github.com>
…10107) * [TTS] Add config and modules for 22khz and 44khz audio codec Signed-off-by: Ryan <rlangman@nvidia.com> * Apply isort and black reformatting Signed-off-by: rlangman <rlangman@users.noreply.github.com> * [TTS] Add argument docstring to new modules Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: rlangman <rlangman@users.noreply.github.com> Co-authored-by: rlangman <rlangman@users.noreply.github.com> Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
…10107) * [TTS] Add config and modules for 22khz and 44khz audio codec Signed-off-by: Ryan <rlangman@nvidia.com> * Apply isort and black reformatting Signed-off-by: rlangman <rlangman@users.noreply.github.com> * [TTS] Add argument docstring to new modules Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: rlangman <rlangman@users.noreply.github.com> Co-authored-by: rlangman <rlangman@users.noreply.github.com> Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
…10107) * [TTS] Add config and modules for 22khz and 44khz audio codec Signed-off-by: Ryan <rlangman@nvidia.com> * Apply isort and black reformatting Signed-off-by: rlangman <rlangman@users.noreply.github.com> * [TTS] Add argument docstring to new modules Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: rlangman <rlangman@users.noreply.github.com> Co-authored-by: rlangman <rlangman@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
What does this PR do ?
Add config files and corresponding modules for optimized audio codec training.
Collection: [TTS]
Changelog
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type: