-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897
Comments
If the current version is still in development, would it also be possible to point to any previsous working version? |
Also having the same issue, after successfully training last week |
I can reproduce the issue in my case too. Download this and replace ./diffusers/examples/text_to_image/train_text_to_image_lora.py |
Same behaviour here, falling back to the version @wellCh4n provided solved the problem. |
Can confirm this fixes the error for me, however at least on Colab with a T4 runtime I then get a "Expected is_sm80 || is_sm90 to be true, but got false." error message when the script tries to backpropagate the loss. Not sure if this is an issue with the new script or some compatibility issue with the CUDA drivers in the Colab though. |
This seems like a setup problem to me as I am unable to reproduce it, even on a Google Colab: |
I got the same error. However, reverting to the previous version, as @wellCh4n suggested, resolved the issue. |
same issue following for fix |
I am gonna have to repeat myself here: |
@sayakpaul Is there anything we can do to help you reproduce this issue? Seems significant as multiple people with different setups have encountered the same issue. Otherwise we're forced to keep using this older version indefinently. |
A Colab notebook would be nice because that's the easiest to reproduce. As already indicated here, I was not able to reproduce at all: #5897 (comment). And I am quite sure #5388 will resolve these problems for good. |
Hopefuly this is fixed when moving to PEFT - in the meantime if you don't want to revert to an older version, I had the same issue, and fixed it by adding 1 line:
At my line 539, immediately after the LORA weights are added, and outside the loop:
Thanks to @IceClear and others that found that some of the unet was on the wrong device. |
If you want to open a PR fixing it, more than happy to merge :) |
@sayakpaul Thank you - I've opened #6061, let me know if it needs any modification |
Is this still on the progress ? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Describe the bug
I tried to experiment with LoRA training following examples/text_to_image/README.md#training-with-lora.
However, I got the error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
on line 801.The same issue did not occur when I was trying the the same example (with the implementation at that time) months ago. I noticed there were several commits after that.
I followed the README.md for installing packages and the non-LoRA training works well.
Thank you very much!
Reproduction
Then cd in the folder
examples/text_to_image
and runexamples/text_to_image
run the followingLogs
System Info
diffusers
version: 0.24.0.dev0Who can help?
@sayakpaul @patrickvonplaten
The text was updated successfully, but these errors were encountered: