RUNNING out of memory when training and training & loss functions not computing. MSI GeForce RTX 4070 Ti 12 GB Video Card #250

scoroman · 2024-06-09T22:30:42Z

scoroman
Jun 9, 2024

I have a $800 graphics card. why can't I cook? Also, some of my training and loss functions aren't kicking in. i'm only getting mel loss. I have a basic configuration pointing to the correct pathways and configs for the models used (ASR, pitch extractor, Bert). anyone have a similar issue and can diagnose what the problem is?:

config.yaml:
save_freq: 2
log_interval: 10
device: "cuda"
epochs_1st: 200 # number of epochs for first stage training (pre-training)
epochs_2nd: 100 # number of peochs for second stage training (joint training)
batch_size: 2
max_len: 400 # maximum number of frames
pretrained_model: ""
second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage
load_only_params: false # set to true if do not want to load epoch numbers and optimizer parameters

.......

model_params:
multispeaker: false

dim_in: 16
hidden_dim: 512 # 512
max_conv_dim: 512
n_layer: 3
n_mels: 80

n_token: 178 # number of phoneme tokens
max_dur: 50 # maximum duration of a single phoneme
style_dim: 128 # style vector size

dropout: 0.2

accelerate launch --mixed_precision=fp16 train_first.py --config_path ./Configs/config.yml

console:
............
Epoch [1/200], Step [840/853], Mel Loss: 0.51713, Gen Loss: 0.00000, Disc Loss: 0.00000, Mono Loss: 0.00000, S2S Loss: 0.00000, SLM Loss: 0.00000
Time elasped: 252.5265998840332
.............

FROZEN - not enough memory WOMP WOMP

78Alpha · 2024-07-17T22:14:26Z

78Alpha
Jul 17, 2024

This method is very resource intensive. The reference of 800 max length needed an 80GB card. The minimum acceptable length of 200 gets you between 13 GB and 16 GB.

You could try the dagshub repo with batch size 1 and sharding, but the model may not function.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUNNING out of memory when training and training & loss functions not computing. MSI GeForce RTX 4070 Ti 12 GB Video Card #250

{{title}}

Replies: 1 comment

{{title}}

Select a reply

RUNNING out of memory when training and training & loss functions not computing. MSI GeForce RTX 4070 Ti 12 GB Video Card #250

scoroman Jun 9, 2024

Replies: 1 comment

78Alpha Jul 17, 2024

scoroman
Jun 9, 2024

78Alpha
Jul 17, 2024