Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding OverFlow #2183

Merged
merged 32 commits into from
Dec 12, 2022
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
405bffe
Adding encoder
shivammehta25 Nov 26, 2022
d607993
currently modifying hmm
shivammehta25 Nov 27, 2022
a324920
Adding hmm
shivammehta25 Nov 28, 2022
8628648
Adding overflow
shivammehta25 Nov 30, 2022
6ec83c4
Adding overflow setting up flat start
shivammehta25 Dec 1, 2022
783a982
Removing runs
shivammehta25 Dec 1, 2022
10f15e0
adding normalization parameters
shivammehta25 Dec 1, 2022
aff8b1f
Fixing models on same device
shivammehta25 Dec 1, 2022
62941d6
Training overflow and plotting evaluations
shivammehta25 Dec 2, 2022
f448ea4
Adding inference
shivammehta25 Dec 3, 2022
ff33837
At the end of epoch the test sentences are coming on cpu instead of gpu
shivammehta25 Dec 4, 2022
3edb0d2
Adding figures from model during training to monitor
shivammehta25 Dec 5, 2022
5fc800c
reverting tacotron2 training recipe
shivammehta25 Dec 5, 2022
427dfe5
fixing inference on gpu for test sentences on config
shivammehta25 Dec 5, 2022
ecc12c6
moving helpers and texts within overflows source code
shivammehta25 Dec 5, 2022
b86f3f8
renaming to overflow
shivammehta25 Dec 5, 2022
995ee93
moving loss to the model file
shivammehta25 Dec 5, 2022
5b0fe46
Fixing the rename
shivammehta25 Dec 5, 2022
5377f87
Model training but not plotting the test config sentences's audios
shivammehta25 Dec 5, 2022
bd5be6c
Formatting logs
shivammehta25 Dec 5, 2022
755aa6f
Changing model name to camelcase
shivammehta25 Dec 5, 2022
1350a4b
Fixing test log
shivammehta25 Dec 5, 2022
3c986fd
Fixing plotting bug
shivammehta25 Dec 6, 2022
4a5b1a0
Adding some tests
shivammehta25 Dec 6, 2022
5b1dabc
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 7, 2022
f43d7e3
Adding more tests to overflow
shivammehta25 Dec 8, 2022
c3d0167
Adding all tests for overflow
shivammehta25 Dec 9, 2022
ddefe34
making changes to camel case in config
shivammehta25 Dec 9, 2022
c2df9f3
Adding information about parameters and docstring
shivammehta25 Dec 10, 2022
9927434
removing compute_mel_statistics moved statistic computation to the mo…
shivammehta25 Dec 10, 2022
340cd0b
Added overflow in readme
shivammehta25 Dec 10, 2022
aca3fe1
Adding more test cases, now it doesn't saves transition_p like tensor…
shivammehta25 Dec 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: 'https://github.com/psf/black'
rev: 20.8b1
rev: 22.3.0
hooks:
- id: black
language_version: python3
Expand Down
200 changes: 200 additions & 0 deletions TTS/tts/configs/overflow_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig


@dataclass
class OverflowConfig(BaseTTSConfig): # The classname has to be camel case
"""
Define parameters for OverFlow model.

Example:

>>> from TTS.tts.configs.overflow_config import OverflowConfig
>>> config = OverflowConfig()

Args:
model (str):
Model name used to select the right model class to initilize. Defaults to `Overflow`.
run_eval_steps (int):
Run evalulation epoch after N steps. If None, waits until training epoch is completed. Defaults to None.
save_step (int):
Save local checkpoint every save_step steps. Defaults to 500.
plot_step (int):
Plot training stats on the logger every plot_step steps. Defaults to 1.
model_param_stats (bool):
Log model parameters stats on the logger dashboard. Defaults to False.
force_generate_statistics (bool):
Force generate mel normalization statistics. Defaults to False.
mel_statistics_parameter_path (str):
Path to the mel normalization statistics.If the model doesn't finds a file there it will generate statistics.
Defaults to None.
num_chars (int):
Number of characters used by the model. It must be defined before initializing the model. Defaults to None.
state_per_phone (int):
Generates N states per phone. Similar, to `add_blank` parameter in GlowTTS but in Overflow it is upsampled by model's encoder. Defaults to 2.
encoder_in_out_features (int):
Channels of encoder input and character embedding tensors. Defaults to 512.
encoder_n_convolutions (int):
Number of convolution layers in the encoder. Defaults to 3.
out_channels (int):
Channels of the final model output. It must match the spectragram size. Defaults to 80.
ar_order (int):
Autoregressive order of the model. Defaults to 1. In ablations of Neural HMM it was found that more autoregression while giving more variation hurts naturalness of the synthesised audio.
sampling_temp (float):
Variation added to the sample from the latent space of neural HMM. Defaults to 0.334.
deterministic_transition (bool):
deterministic duration generation based on duration quantiles as defiend in "S. Ronanki, O. Watts, S. King, and G. E. Henter, “Medianbased generation of synthetic speech durations using a nonparametric approach,” in Proc. SLT, 2016.". Defaults to True.
duration_threshold (float):
Threshold for duration quantiles. Defaults to 0.55. Tune this to change the speaking rate of the synthesis, where lower values defines a slower speaking rate and higher values defines a faster speaking rate.
use_grad_checkpointing (bool):
Use gradient checkpointing to save memory. In a multi-GPU setting currently pytorch does not supports gradient checkpoint inside a loop so we will have to turn it off then.Adjust depending on whatever get more batch size either by using a single GPU or multi-GPU. Defaults to True.
max_sampling_time (int):
Maximum sampling time while synthesising latents from neural HMM. Defaults to 1000.
prenet_type (str):
`original` or `bn`. `original` sets the default Prenet and `bn` uses Batch Normalization version of the
Prenet. Defaults to `original`.
prenet_dim (int):
Dimension of the Prenet. Defaults to 256.
prenet_n_layers (int):
Number of layers in the Prenet. Defaults to 2.
prenet_dropout (float):
Dropout rate of the Prenet. Defaults to 0.5.
prenet_dropout_at_inference (bool):
Use dropout at inference time. Defaults to False.
memory_rnn_dim (int):
Dimension of the memory LSTM to process the prenet output. Defaults to 1024.
outputnet_size (list[int]):
Size of the output network inside the neural HMM. Defaults to [1024].
flat_start_params (dict):
Parameters for the flat start initialization of the neural HMM. Defaults to `{"mean": 0.0, "std": 1.0, "transition_p": 0.14}`.
It will be recomputed when you pass the dataset.
std_floor (float):
Floor value for the standard deviation of the neural HMM. Prevents model cheating by putting point mass and getting infinite likelihood at any datapoint. Defaults to 0.01.
It is called `variance flooring` in standard HMM literature.
hidden_channels_dec (int):
Number of base hidden channels used by the decoder WaveNet network. Defaults to 150.
kernel_size_dec (int):
Decoder kernel size. Defaults to 5
dilation_rate (int):
Rate to increase dilation by each layer in a decoder block. Defaults to 1.
num_flow_blocks_dec (int):
Number of decoder layers in each decoder block. Defaults to 4.
dropout_p_dec (float):
Dropout rate of the decoder. Defaults to 0.05.
num_splits (int):
Number of split levels in inversible conv1x1 operation. Defaults to 4.
num_squeeze (int):
Number of squeeze levels. When squeezing channels increases and time steps reduces by the factor
'num_squeeze'. Defaults to 2.
sigmoid_scale (bool):
enable/disable sigmoid scaling in decoder. Defaults to False.
c_in_channels (int):
Unused parameter from GlowTTS's decoder. Defaults to 0.
optimizer (str):
Optimizer to use for training. Defaults to `adam`.
optimizer_params (dict):
Parameters for the optimizer. Defaults to `{"weight_decay": 1e-6}`.
grad_clip (float):
Gradient clipping threshold. Defaults to 40_000.
lr (float):
Learning rate. Defaults to 1e-3.
lr_scheduler (str):
Learning rate scheduler for the training. Use one from `torch.optim.Scheduler` schedulers or
`TTS.utils.training`. Defaults to `None`.
min_seq_len (int):
Minimum input sequence length to be used at training.
max_seq_len (int):
Maximum input sequence length to be used at training. Larger values result in more VRAM usage.
"""

model: str = "Overflow"

# Training and Checkpoint configs
run_eval_steps: int = 100
save_step: int = 500
plot_step: int = 1
model_param_stats: bool = False

# data parameters
force_generate_statistics: bool = False
mel_statistics_parameter_path: str = None

# Encoder parameters
num_chars: int = None
state_per_phone: int = 2
encoder_in_out_features: int = 512
encoder_n_convolutions: int = 3

# HMM parameters
out_channels: int = 80
ar_order: int = 1
sampling_temp: float = 0.334
deterministic_transition: bool = True
duration_threshold: float = 0.55
use_grad_checkpointing: bool = True
max_sampling_time: int = 1000

## Prenet parameters
prenet_type: str = "original"
prenet_dim: int = 256
prenet_n_layers: int = 2
prenet_dropout: float = 0.5
prenet_dropout_at_inference: bool = False
memory_rnn_dim: int = 1024

## Outputnet parameters
outputnet_size: List[int] = field(default_factory=lambda: [1024])
flat_start_params: dict = field(default_factory=lambda: {"mean": 0.0, "std": 1.0, "transition_p": 0.14})
std_floor: float = 0.01

# Decoder parameters
hidden_channels_dec: int = 150
kernel_size_dec: int = 5
dilation_rate: int = 1
num_flow_blocks_dec: int = 12
num_block_layers: int = 4
dropout_p_dec: float = 0.05
num_splits: int = 4
num_squeeze: int = 2
sigmoid_scale: bool = False
c_in_channels: int = 0

# optimizer parameters
optimizer: str = "Adam"
optimizer_params: dict = field(default_factory=lambda: {"weight_decay": 1e-6})
grad_clip: float = 40000.0
lr: float = 1e-3
lr_scheduler: str = None

# overrides
min_seq_len: int = 3
max_seq_len: int = 500

# testing
test_sentences: List[str] = field(
default_factory=lambda: [
"Be a voice, not an echo.",
]
)

# Extra needed config
r: int = 1
use_d_vector_file: bool = False
use_speaker_embedding: bool = False

def check_values(self):
"""Validate the hyperparameters.

Raises:
AssertionError: when the parameters network is not defined
AssertionError: transition probability is not between 0 and 1
"""
assert self.ar_order > 0, "AR order must be greater than 0 it is an autoregressive model."
assert (
len(self.outputnet_size) >= 1
), f"Parameter Network must have atleast one layer check the config file for parameter network. Provided: {self.parameternetwork}"
assert (
0 < self.flat_start_params["transition_p"] < 1
), f"Transition probability must be between 0 and 1. Provided: {self.flat_start_params['transition_p']}"
2 changes: 1 addition & 1 deletion TTS/tts/configs/shared_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ class BaseTTSConfig(BaseTrainingConfig):
optimizer: str = "radam"
optimizer_params: dict = None
# scheduler
lr_scheduler: str = ""
lr_scheduler: str = None
lr_scheduler_params: dict = field(default_factory=lambda: {})
# testing
test_sentences: List[str] = field(default_factory=lambda: [])
Expand Down
Empty file.
Loading