-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Error using CPU Only Mode #2247
Comments
Did you use Low VRAM model in controlnet? |
No, But it's same with lowvram and Low VRAM checkbox |
Can you share the full console log? It would be better you follow the bug template and share everything asked there. |
@huchenlei sure, Here is the log :
|
Can you share all relevant setups? i.e. steps to reproduce the problem? Does this problem occur for every ControlNet models or just T2I adapters? |
Edit: (Maybe related to style pull down options?) So I'm getting it when using CN along w/ AnimateDiff. Oddly, it seems like I only get the error if there's a STYLE in the style pull-down on the right under "Generate". If I have a style in there (that I haven't moved to the prompt box as TXT) I get this tensor error. If I move it to the prompt or X it out the generation will go. It happens with all of the models I've tried it at least on v1.5. I'm pretty sure I've gotten it w/in the past hr w/o Animatediff also engaged, but as I said I just it's super late and I'll have to test/update more tomorrow. In the meantime if attempting to reproduce try adding a style in that pull down and leaving it there while using AnimateDiff. I always get an error right now w/ that combination. I'm getting this now also but it's late (12:30) and I can't do a full report atm. I just finally updated from NVidia driver ver 531.68 to the latest (546.17) and that's when I started getting this error. Likely related to that GPU/CPU memory sharing feature they implemented after 531 that caused all sorts of slow down problems. I have not yet added any exceptions for that in settings (was hoping I didn't need to w/ a 4090 and 24gb VRam). See: AUTOMATIC1111/stable-diffusion-webui#7980 (Edit 2: If I increase the number of models allowed to be stored on VRam in memory I get this error w/o a style in the box - so imho it's got to be related to that new NVidia change w/ shared VRam and CPU RAM)
My system info:
|
I also get this - how did you gen that sysinfo? I could submit mine too. I'm reading this error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)" as CN trying to use CUDA and ignoring the "--use-cpu all" arg then getting mad when SD is on the CPU but CN is on the GPU - is there an arg that would explicitly force CN to run on CPU? I suspect using it in conjunction with --use-cpu would fix this issue. I also have --no-half and --skip-torch-cuda-test - skip-torch-cuda-test was necessary for CPU only to work at all for SD, I'm not sure what no-half does, i'll try removing either/both and see if it changes anything - I'm seeing other posts on here that confirm that use-cpu does work for some people so I think it'll be something that matches between both our setups that'll be the cause. |
Oh also for repro I just load automatic's webgui with Found an old trick to force cpu for SD by modifying some python files, poking around the code now to see if I can find a way to do the same for CN |
Ok I got it to work with a hack and it seems to confirm that CN is not respecting use-cpu for some reason. It's not the same hack as I had found, took some digging to figure it out. This fixes Depth, but I'm not sure about OpenPose, testing it now. THIS IS A WORKAROUND, NOT A FIX. If you do this it's at your own risk, be ready to delete the whole instance and reinstall from scratch and updates will break it. in modules\devices.py, modify get_optimal_device() on line 36 to This seems to force everything onto the CPU and the error about 2 devices goes away. EDIT: It works for openpose too, seems to have fixed everything, and my computer doesn't become a powerpoint slideshow when I'm generating things anymore (even if it takes so much longer to generate images.) |
Better fix found - the issue is with "--use-cpu all", because CN looks for the device for the task "controlnet", and modules/devices.py only returns CPU if the task passed into the function is listed in the argument, and controlnet != all. The fix without changing any code is to use "--use-cpu all controlnet" (no commas, space delimited). Then use_CPU will be an array containing both all and controlnet, so when controlnet calls get_device_for("controlnet"), get_device_for finds "controlnet" in the array and returns the CPU. I found the same issue listed in automatic's webui (AUTOMATIC1111/stable-diffusion-webui#14097) and added details there including a possible code fix, but using "--use-cpu all controlnet" is something you can do today without any code changes. This also seems to confirm the issue is not with this extension but with webui itself. I added the " --no-half-controlnet" argument too - my understanding is "half" refers to half-sized floating points (16bit) and that's a GPU-only feature. If it fails for you try adding it too |
Hi, I'm launching latest SD and latest ControlNet with this arguments to test on CPU only.
Reactor and SD are all good with CPU but when I activate Control Net it throws following error :
Am I doing something wrong?
The text was updated successfully, but these errors were encountered: