Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lowvram and novram OOM at slightly higher resolutions #5

Closed
bazettfraga opened this issue Feb 9, 2023 · 4 comments
Closed

lowvram and novram OOM at slightly higher resolutions #5

bazettfraga opened this issue Feb 9, 2023 · 4 comments

Comments

@bazettfraga
Copy link
Contributor

OS: Arch Linux
Kernel: 6.1.9-arch1-2
GPU: Nvidia GeForce GTX 1060 3GB
Nvidia Driver Version: 525.85.05
CUDA Version: 12.0

When attempting to generate a picture that was 768x768 on novram and lowvram, CUDA experienced an out of memory error. Used "Load Default", with only difference being step count reduced to 8 from 20 for faster testing time. This amount of VRAM should reasonably be able to output higher resolution images (up to around 1152x768~) with low vram optimizations.

    executed += recursive_execute(prompt, self.outputs, x, extra_data)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/main.py", line 71, in recursive_execute
    executed += recursive_execute(prompt, outputs, input_unique_id, extra_data)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/main.py", line 76, in recursive_execute
    outputs[unique_id] = getattr(obj, obj.FUNCTION)(**input_data_all)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/nodes.py", line 101, in decode
    return (vae.decode(samples), )
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/sd.py", line 311, in decode
    pixel_samples = self.first_stage_model.decode(1. / self.scale_factor * samples)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/models/autoencoder.py", line 94, in decode
    dec = self.decoder(z)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/model.py", line 132, in forward
    h = nonlinearity(h)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/model.py", line 43, in nonlinearity
    return x*torch.sigmoid(x)
RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 2.94 GiB total capacity; 1.45 GiB already allocated; 364.56 MiB free; 2.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```
@comfyanonymous
Copy link
Owner

Can you try the latest and see if I fixed it?

@bazettfraga
Copy link
Contributor Author

It works, only on the first generation. The first image goes off without a hitch, but every subsequent generation produces an error.

salt@saltPC /mnt/2TBDa/SDSoftware/comfyfork/ComfyUI$] > python main.py --lowvram
Set vram state to: LOW VRAM
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
No module 'xformers'. Proceeding without it.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading model from /mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/models/checkpoints/SDModels/AbyssOrangeMix2_hard.safetensors
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.20s/it]
got prompt
deleted 3
deleted 8
deleted 9
  0%|                                                                                   | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/sub_quadratic_attention.py", line 152, in _get_attention_scores_no_kv_chunking
    attn_probs = attn_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 2.94 GiB total capacity; 1.38 GiB already allocated; 496.56 MiB free; 1.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/main.py", line 178, in execute
    executed += recursive_execute(prompt, self.outputs, x, extra_data)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/main.py", line 71, in recursive_execute
    executed += recursive_execute(prompt, outputs, input_unique_id, extra_data)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/main.py", line 71, in recursive_execute
    executed += recursive_execute(prompt, outputs, input_unique_id, extra_data)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/main.py", line 76, in recursive_execute
    outputs[unique_id] = getattr(obj, obj.FUNCTION)(**input_data_all)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/nodes.py", line 426, in sample
    return common_ksampler(self.device, model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/nodes.py", line 396, in common_ksampler
    samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/samplers.py", line 282, in sample
    samples = getattr(k_diffusion_sampling, self.sampler)(self.model_k, noise, sigmas, extra_args={"cond":positive, "uncond":negative, "cond_scale": cfg})
  File "/home/salt/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/k_diffusion/sampling.py", line 128, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/samplers.py", line 144, in forward
    cond, uncond = calc_cond_uncond_batch(cond, uncond, x, sigma, max_total_area)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/samplers.py", line 123, in calc_cond_uncond_batch
    output = self.inner_model(input_x, sigma_, cond=c).chunk(batch_chunks)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/k_diffusion/external.py", line 114, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/k_diffusion/external.py", line 140, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/models/diffusion/ddpm.py", line 862, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/models/diffusion/ddpm.py", line 1334, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 781, in forward
    h = module(h, emb, context)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/salt/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/salt/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/attention.py", line 539, in forward
    x = block(x, context=context[i])
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/salt/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/attention.py", line 474, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/diffusionmodules/util.py", line 129, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/attention.py", line 477, in _forward
    x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  File "/home/salt/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/salt/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/attention.py", line 220, in forward
    hidden_states = efficient_dot_product_attention(
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/sub_quadratic_attention.py", line 227, in efficient_dot_product_attention
    return compute_query_chunk_attn(
  File "/mnt/2TBDa/SDSoftware/comfyfork/ComfyUI/comfy/ldm/modules/sub_quadratic_attention.py", line 154, in _get_attention_scores_no_kv_chunking
    except torch.cuda.OutOfMemoryError:
AttributeError: module 'torch.cuda' has no attribute 'OutOfMemoryError'```

@comfyanonymous
Copy link
Owner

Can you try it now?

@bazettfraga
Copy link
Contributor Author

It works perfectly now, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants