You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 39.59 GiB total capacity; 33.31 GiB already allocated; 1.06 GiB free; 36.81 GiB reserved in total by PyTorch)
#144
Open
asagar60 opened this issue
Apr 30, 2022
· 3 comments
I trying to generate Images using pretrained StyleGAN2-SPD-ADA , but this error is coming which i initially thought was due to 15 GB GPU of colab , but i tried with 24, and 40 GB GPU still getting the same error
I tried reducing the batchsize from 64-> 32 --> 16 .. still the same
[INFO] 2022-04-30 07:22:26 > Generator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=G-best-weights-step=196000.pth
[INFO] 2022-04-30 07:22:26 > EMA_Generator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=G_ema-best-weights-step=196000.pth
[INFO] 2022-04-30 07:22:26 > Discriminator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=D-best-weights-step=196000.pth
/opt/conda/lib/python3.8/site-packages/torchvision/models/inception.py:44: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
warnings.warn(
wandb: Currently logged in as: asagar60 (use wandb login --relogin to force relogin)
wandb: Tracking run with wandb version 0.12.15
wandb: Run data is saved locally in gen/wandb/run-20220430_072228-1zr43u7c
wandb: Run wandb offline to turn off syncing.
wandb: Resuming run StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19
wandb: ⭐️ View project at https://wandb.ai/asagar60/uncategorized
wandb: 🚀 View run at https://wandb.ai/asagar60/uncategorized/runs/1zr43u7c
[INFO] 2022-04-30 07:22:29 > Start training!
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Traceback (most recent call last):
File "PyTorch-StudioGAN/src/main.py", line 182, in
loader.load_worker(local_rank=rank,
File "/home/PyTorch-StudioGAN/src/loader.py", line 348, in load_worker
gen_acml_loss = worker.train_generator(current_step=step)
File "/home/PyTorch-StudioGAN/src/worker.py", line 564, in train_generator
fake_dict = self.Dis(fake_images_, fake_labels)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 849, in forward
x, img = block(x, img, **block_kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 648, in forward
x = self.conv0(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 176, in forward
x = conv2d_resample.conv2d_resample(x=x,
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_resample.py", line 133, in conv2d_resample
return _conv2d_wrapper(x=x, w=w, padding=[py0,px0], groups=groups, flip_weight=flip_weight)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_resample.py", line 41, in _conv2d_wrapper
return op(x, w, stride=stride, padding=padding, groups=groups)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_gradfix.py", line 37, in conv2d
return _conv2d_gradfix(transpose=False, weight_shape=weight.shape, stride=stride, padding=padding, output_padding=0, dilation=dilation, groups=groups).apply(input, weight, bias)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_gradfix.py", line 127, in forward
return torch.nn.functional.conv2d(input=input, weight=weight, bias=bias, **common_kwargs)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 39.59 GiB total capacity; 33.31 GiB already allocated; 1.06 GiB free; 36.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered:
I think there's a bug regarding -v option. For now, instead of saving it as a canvas (its what -v option does) you can try to save images one by one in png format. To do so, add -sf -sf_num NUMBER_OF_IMAGES_TO_GENERATE options. If you are only planning to generate images, you can omit -t option and specify -metrics none to avoid unnecessary training and evaluation steps. We'll try to fix the bug ASAP.
+) Since StyleGAN Models are trained using mixed precision, I also recommend using -mpc in all cases.
@alex4727
Hi,
You have mentioned in your comment that StyleGAN Models are trained using Mixed Precision, but in the code, wherever mixed-precision is used, an additional condition of not is_stylegan is present, so I was trying to figure out why mixed-precision training is disabled for StyleGAN, and now it confuses me as you mentioned that StyleGAN uses mpc.
It would be very helpful if you could clarify that. Thanks in advance..!!
@lavish619
Sorry for late reply,
You are correct, wherever mixed-precision is used, an additional condition of not is_stylegan is present. That is because StyleGAN incorporates fp16 datatypes in the model file itself so there's no need of using torch.cuda.amp.autocast() wrapper in the worker.
Thanks!
I trying to generate Images using pretrained StyleGAN2-SPD-ADA , but this error is coming which i initially thought was due to 15 GB GPU of colab , but i tried with 24, and 40 GB GPU still getting the same error
I tried reducing the batchsize from 64-> 32 --> 16 .. still the same
code :
!python PyTorch-StudioGAN/src/main.py -t -v -ckpt StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19 -cfg PyTorch-StudioGAN/src/configs/AFHQ/StyleGAN2-SPD-ADA.yaml -save gen -data afhq -best
Logs:--
[INFO] 2022-04-30 07:22:26 > Generator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=G-best-weights-step=196000.pth
[INFO] 2022-04-30 07:22:26 > EMA_Generator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=G_ema-best-weights-step=196000.pth
[INFO] 2022-04-30 07:22:26 > Discriminator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=D-best-weights-step=196000.pth
/opt/conda/lib/python3.8/site-packages/torchvision/models/inception.py:44: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
warnings.warn(
wandb: Currently logged in as: asagar60 (use
wandb login --relogin
to force relogin)wandb: Tracking run with wandb version 0.12.15
wandb: Run data is saved locally in gen/wandb/run-20220430_072228-1zr43u7c
wandb: Run
wandb offline
to turn off syncing.wandb: Resuming run StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19
wandb: ⭐️ View project at https://wandb.ai/asagar60/uncategorized
wandb: 🚀 View run at https://wandb.ai/asagar60/uncategorized/runs/1zr43u7c
[INFO] 2022-04-30 07:22:29 > Start training!
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Traceback (most recent call last):
File "PyTorch-StudioGAN/src/main.py", line 182, in
loader.load_worker(local_rank=rank,
File "/home/PyTorch-StudioGAN/src/loader.py", line 348, in load_worker
gen_acml_loss = worker.train_generator(current_step=step)
File "/home/PyTorch-StudioGAN/src/worker.py", line 564, in train_generator
fake_dict = self.Dis(fake_images_, fake_labels)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 849, in forward
x, img = block(x, img, **block_kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 648, in forward
x = self.conv0(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 176, in forward
x = conv2d_resample.conv2d_resample(x=x,
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_resample.py", line 133, in conv2d_resample
return _conv2d_wrapper(x=x, w=w, padding=[py0,px0], groups=groups, flip_weight=flip_weight)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_resample.py", line 41, in _conv2d_wrapper
return op(x, w, stride=stride, padding=padding, groups=groups)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_gradfix.py", line 37, in conv2d
return _conv2d_gradfix(transpose=False, weight_shape=weight.shape, stride=stride, padding=padding, output_padding=0, dilation=dilation, groups=groups).apply(input, weight, bias)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_gradfix.py", line 127, in forward
return torch.nn.functional.conv2d(input=input, weight=weight, bias=bias, **common_kwargs)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 39.59 GiB total capacity; 33.31 GiB already allocated; 1.06 GiB free; 36.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered: