Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error using CPU Only Mode #2247

Open
CycloneRing opened this issue Nov 10, 2023 · 10 comments
Open

CUDA Error using CPU Only Mode #2247

CycloneRing opened this issue Nov 10, 2023 · 10 comments
Labels
help wanted Extra attention is needed Nvdia Driver Issue Issue related to Nvdia driver update

Comments

@CycloneRing
Copy link

Hi, I'm launching latest SD and latest ControlNet with this arguments to test on CPU only.

--use-cpu all --precision full --no-half --skip-torch-cuda-test

Reactor and SD are all good with CPU but when I activate Control Net it throws following error :

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Am I doing something wrong?

@sdbds
Copy link
Collaborator

sdbds commented Nov 10, 2023

Did you use Low VRAM model in controlnet?

@CycloneRing
Copy link
Author

Did you use Low VRAM model in controlnet?

No, But it's same with lowvram and Low VRAM checkbox

@huchenlei
Copy link
Collaborator

Can you share the full console log? It would be better you follow the bug template and share everything asked there.

@CycloneRing
Copy link
Author

Can you share the full console log? It would be better you follow the bug template and share everything asked there.

@huchenlei sure, Here is the log :

2023-11-12 04:18:28,757 - ControlNet - INFO - Loading model: t2i_depth [d0e98e8d]
2023-11-12 04:18:28,907 - ControlNet - INFO - Loaded state_dict from [C:\StableAI\webui\models\ControlNet\t2i_depth.safetensors]
2023-11-12 04:18:28,907 - ControlNet - INFO - t2i_adapter_config
2023-11-12 04:18:34,949 - ControlNet - INFO - ControlNet model t2i_depth [d0e98e8d] loaded.
2023-11-12 04:18:34,973 - ControlNet - INFO - Loading preprocessor: none
2023-11-12 04:18:34,973 - ControlNet - INFO - preprocessor resolution = 512
2023-11-12 04:18:35,181 - ControlNet - INFO - ControlNet Hooked - Time = 6.573274374008179
  0%|                                                                                                                                                                                | 0/4 [00:03<?, ?it/s]
*** Error completing request
*** Arguments: ('task(3l5f581vczqygf2)', 'portrait of a shaved head short hair sci fi soldier young beautiful girl from mass effect <lora:DetailEnhancer:0.7>, 4k, 8k, concept art, high quality, sharp, insanely detailed, good details, environment, fantasy, planet, girl with armor, alien girl, (cute:1.2), ((mass effect 2)), ((vexille)), symmetric, 1girl, serious look, cinematic, amazing face, very beautiful, ((chipset around head))', 'poor drawing, low quality, bad quality, mutated, disfigured, blur, unsharp, jpeg artifacts, mutated fingers, unrealistic, ugly, bad, (watermark), logo, clone, duplicate, dark, bright, credit, asymmetric, adult', [], 5, 'DDIM', 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x000001B69D2D77F0>, 0, False, '', 0.8, 926762877, False, -1, 0, 0, 0, False, False, {'ad_model': 'yolo_face_detector.pt', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001B69D2D7B80>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001B69D2D5B10>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001B69D2D7670>, None, False, '0', '0', 'inswapper_128.onnx', 'CodeFormer', 1, True, 'None', 1, 1, False, True, 1, 0, 0, False, 0.5, True, False, 'CUDA', False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50) {}
    Traceback (most recent call last):
      File "C:\StableAI\core\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\StableAI\core\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\StableAI\core\modules\txt2img.py", line 55, in txt2img
        processed = processing.process_images(p)
      File "C:\StableAI\core\modules\processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "C:\StableAI\webui\extensions\builtin\ControlNet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "C:\StableAI\core\modules\processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "C:\StableAI\webui\extensions\builtin\ControlNet\scripts\hook.py", line 451, in process_sample
        return process.sample_before_CN_hack(*args, **kwargs)
      File "C:\StableAI\core\modules\processing.py", line 1140, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "C:\StableAI\core\modules\sd_samplers_timesteps.py", line 158, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\StableAI\core\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "C:\StableAI\core\modules\sd_samplers_timesteps.py", line 158, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\StableAI\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\StableAI\core\modules\sd_samplers_timesteps_impl.py", line 24, in ddim
        e_t = model(x, timesteps[index].item() * s_in, **extra_args)
      File "C:\StableAI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\StableAI\core\modules\sd_samplers_cfg_denoiser.py", line 188, in forward
        x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
      File "C:\StableAI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\StableAI\core\modules\sd_samplers_timesteps.py", line 30, in forward
        return self.inner_model.apply_model(input, timesteps, **kwargs)
      File "C:\StableAI\core\modules\sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "C:\StableAI\core\modules\sd_hijack_utils.py", line 28, in __call__
        return self.__orig_func(*args, **kwargs)
      File "C:\StableAI\system\library\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "C:\StableAI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\StableAI\system\library\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "C:\StableAI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\StableAI\webui\extensions\builtin\ControlNet\scripts\hook.py", line 858, in forward_webui
        raise e
      File "C:\StableAI\webui\extensions\builtin\ControlNet\scripts\hook.py", line 855, in forward_webui
        return forward(*args, **kwargs)
      File "C:\StableAI\webui\extensions\builtin\ControlNet\scripts\hook.py", line 767, in forward
        h = aligned_adding(h, total_t2i_adapter_embedding.pop(0), require_inpaint_hijack)
      File "C:\StableAI\webui\extensions\builtin\ControlNet\scripts\hook.py", line 259, in aligned_adding
        return base + x
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

---

@huchenlei
Copy link
Collaborator

Can you share all relevant setups? i.e. steps to reproduce the problem?

Does this problem occur for every ControlNet models or just T2I adapters?

@CCpt5
Copy link

CCpt5 commented Nov 20, 2023

Edit: (Maybe related to style pull down options?)

So I'm getting it when using CN along w/ AnimateDiff. Oddly, it seems like I only get the error if there's a STYLE in the style pull-down on the right under "Generate". If I have a style in there (that I haven't moved to the prompt box as TXT) I get this tensor error. If I move it to the prompt or X it out the generation will go. It happens with all of the models I've tried it at least on v1.5.

I'm pretty sure I've gotten it w/in the past hr w/o Animatediff also engaged, but as I said I just it's super late and I'll have to test/update more tomorrow.

In the meantime if attempting to reproduce try adding a style in that pull down and leaving it there while using AnimateDiff. I always get an error right now w/ that combination.


I'm getting this now also but it's late (12:30) and I can't do a full report atm. I just finally updated from NVidia driver ver 531.68 to the latest (546.17) and that's when I started getting this error.

Likely related to that GPU/CPU memory sharing feature they implemented after 531 that caused all sorts of slow down problems. I have not yet added any exceptions for that in settings (was hoping I didn't need to w/ a 4090 and 24gb VRam). See: AUTOMATIC1111/stable-diffusion-webui#7980

(Edit 2: If I increase the number of models allowed to be stored on VRam in memory I get this error w/o a style in the box - so imho it's got to be related to that new NVidia change w/ shared VRam and CPU RAM)


Traceback (most recent call last):
  File "D:\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "D:\stable-diffusion-webui\modules\call_queue.py", line 36, in f
    res = func(*args, **kwargs)
  File "D:\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
    processed = processing.process_images(p)
  File "D:\stable-diffusion-webui\modules\processing.py", line 732, in process_images
    res = process_images_inner(p)
  File "D:\stable-diffusion-webui\extensions\sd-webui-animatediff\scripts\animatediff_cn.py", line 118, in hacked_processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "D:\stable-diffusion-webui\modules\processing.py", line 867, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
  File "D:\stable-diffusion-webui\extensions\3sd-webui-controlnet\scripts\hook.py", line 420, in process_sample
    return process.sample_before_CN_hack(*args, **kwargs)
  File "D:\stable-diffusion-webui\modules\processing.py", line 1140, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
  File "D:\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
    return func()
  File "D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\sd-webui-animatediff\scripts\animatediff_infv2v.py", line 269, in mm_cfg_forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "D:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\3sd-webui-controlnet\scripts\hook.py", line 827, in forward_webui
    raise e
  File "D:\stable-diffusion-webui\extensions\3sd-webui-controlnet\scripts\hook.py", line 824, in forward_webui
    return forward(*args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\3sd-webui-controlnet\scripts\hook.py", line 561, in forward
    control = param.control_model(x=x_in, hint=hint, timesteps=timesteps, context=context, y=y)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\3sd-webui-controlnet\scripts\cldm.py", line 31, in forward
    return self.control_model(*args, **kwargs)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\3sd-webui-controlnet\scripts\cldm.py", line 300, in forward
    guided_hint = self.input_hint_block(hint, emb, context)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 102, in forward
    x = layer(x)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 444, in network_Conv2d_forward
    return originals.Conv2d_forward(self, input)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)

My system info:

"Platform": "Windows-10-10.0.19045-SP0",
"Python": "3.10.11",
"Version": "v1.6.0-2-g4afaaf8a",
"Commit": "4afaaf8a020c1df457bcf7250cb1c7f609699fa7",
"Script path": "D:\\stable-diffusion-webui",
"Data path": "D:\\stable-diffusion-webui",
"Extensions dir": "D:\\stable-diffusion-webui\\extensions",
"Checksum": "ce1da00f5c1ef478932e56ae1f69970d67c2a620ff67ca1a534e1fd0db315ab5",
"Commandline": [
    "launch.py",
    "--opt-sdp-attention",
    "--no-half-vae",
    "--opt-channelslast",
    "--disable-safe-unpickle",
    "--skip-torch-cuda-test",
    "--disable-nan-check",
    "--skip-version-check",
    "--ckpt-dir",
    "e:\\stable Diffusion Checkpoints"
],
"Torch env info": {
    "torch_version": "2.0.1+cu118",
    "is_debug_build": "False",
    "cuda_compiled_version": "11.8",
    "gcc_version": null,
    "clang_version": null,
    "cmake_version": "version 3.28.0-rc2",
    "os": "Microsoft Windows 10 Pro",
    "libc_version": "N/A",
    "python_version": "3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)",
    "python_platform": "Windows-10-10.0.19045-SP0",
    "is_cuda_available": "True",
    "cuda_runtime_version": "11.8.89\r",
    "cuda_module_loading": "LAZY",
    "nvidia_driver_version": "546.17",
    "nvidia_gpu_models": "GPU 0: NVIDIA GeForce RTX 4090",
    "cudnn_version": "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\bin\\cudnn_ops_train64_8.dll",
    "pip_version": "pip3",
    "pip_packages": [
        "numpy==1.23.5",
        "open-clip-torch==2.20.0",
        "pytorch-lightning==1.9.4",
        "torch==2.0.1+cu118",
        "torchdiffeq==0.2.3",
        "torchmetrics==1.2.0",
        "torchsde==0.2.5",
        "torchvision==0.15.2+cu118"
    ],
    "conda_packages": null,
    "hip_compiled_version": "N/A",
    "hip_runtime_version": "N/A",
    "miopen_runtime_version": "N/A",
    "caching_allocator_config": "",
    "is_xnnpack_available": "True",
    "cpu_info": [
        "Architecture=9",
        "CurrentClockSpeed=3000",
        "DeviceID=CPU0",
        "Family=207",
        "L2CacheSize=16384",
        "L2CacheSpeed=",
        "Manufacturer=GenuineIntel",
        "MaxClockSpeed=3000",
        "Name=13th Gen Intel(R) Core(TM) i9-13900K",
        "ProcessorType=3",
        "Revision="

@huchenlei huchenlei added help wanted Extra attention is needed Nvdia Driver Issue Issue related to Nvdia driver update labels Nov 20, 2023
@read-0nly
Copy link

I also get this - how did you gen that sysinfo? I could submit mine too.

I'm reading this error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)" as CN trying to use CUDA and ignoring the "--use-cpu all" arg then getting mad when SD is on the CPU but CN is on the GPU - is there an arg that would explicitly force CN to run on CPU? I suspect using it in conjunction with --use-cpu would fix this issue.

I also have --no-half and --skip-torch-cuda-test - skip-torch-cuda-test was necessary for CPU only to work at all for SD, I'm not sure what no-half does, i'll try removing either/both and see if it changes anything - I'm seeing other posts on here that confirm that use-cpu does work for some people so I think it'll be something that matches between both our setups that'll be the cause.

@read-0nly
Copy link

read-0nly commented Nov 27, 2023

Oh also for repro I just load automatic's webgui with
set COMMANDLINE_ARGS= --use-cpu all --skip-torch-cuda-test --enable-insecure-extension-access --api --no-half --no-half-vae --opt-split-attention --always-batch-cond-uncond --no-half-controlnet
then txt2img anything with any controlnets enabled - openpose in particular is the one I'm trying to get working right now. Low VRAM on or off makes no difference. I can't remove skip-torch or use-cpu stops working, same with no-half, and replacing it with precision full causes errors with SD, both together doesn't fix it either.

Found an old trick to force cpu for SD by modifying some python files, poking around the code now to see if I can find a way to do the same for CN

@read-0nly
Copy link

read-0nly commented Nov 27, 2023

Ok I got it to work with a hack and it seems to confirm that CN is not respecting use-cpu for some reason. It's not the same hack as I had found, took some digging to figure it out. This fixes Depth, but I'm not sure about OpenPose, testing it now.

THIS IS A WORKAROUND, NOT A FIX. If you do this it's at your own risk, be ready to delete the whole instance and reinstall from scratch and updates will break it.

in modules\devices.py, modify get_optimal_device() on line 36 to
return torch.device("cpu")
instead of
return torch.device(get_optimal_device_name())

This seems to force everything onto the CPU and the error about 2 devices goes away.

EDIT: It works for openpose too, seems to have fixed everything, and my computer doesn't become a powerpoint slideshow when I'm generating things anymore (even if it takes so much longer to generate images.)

@read-0nly
Copy link

read-0nly commented Nov 28, 2023

Better fix found - the issue is with "--use-cpu all", because CN looks for the device for the task "controlnet", and modules/devices.py only returns CPU if the task passed into the function is listed in the argument, and controlnet != all.

The fix without changing any code is to use "--use-cpu all controlnet" (no commas, space delimited). Then use_CPU will be an array containing both all and controlnet, so when controlnet calls get_device_for("controlnet"), get_device_for finds "controlnet" in the array and returns the CPU.

I found the same issue listed in automatic's webui (AUTOMATIC1111/stable-diffusion-webui#14097) and added details there including a possible code fix, but using "--use-cpu all controlnet" is something you can do today without any code changes.

This also seems to confirm the issue is not with this extension but with webui itself.

I added the " --no-half-controlnet" argument too - my understanding is "half" refers to half-sized floating points (16bit) and that's a GPU-only feature. If it fails for you try adding it too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed Nvdia Driver Issue Issue related to Nvdia driver update
Projects
None yet
Development

No branches or pull requests

5 participants