Assign device to unet. Resolves #5897 #6061

MohamadZeina · 2023-12-05T13:10:39Z

What does this PR do?

Resolves issue #5897 - training using the example script examples/text_to_image/train_text_to_image_lora.py fails because parameters and data are on 2 devices, CPU and CUDA. This is caused by the unet LORA weights - I fix it by sending the unet to the accelerator.device after setting up the LORA weights.

Fixes #5897 (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case. Yes - LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

HuggingFaceDocBuilderDev · 2023-12-05T13:17:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sayakpaul

Can you ensure you get expected quality with this change?

MohamadZeina · 2023-12-05T13:39:10Z

I've removed the initial unet.to(device) that's now redundant. Regarding quality - I don't have much experience training these models but the performance is roughly on par with what I expect.

I'm training a LORA with rank 16, on "stabilityai/stable-diffusion-2" with the usual resolution of 768. On a 3090 I can fit a batch size of 8, and I'm getting 1.65it/s. It's roughly 4 times faster than a LORA I trained on the same machine on SDXL using another example script that had no issues.

I hope that answers your question but let me know if you need anything else.

sayakpaul · 2023-12-07T04:11:24Z

Hi there. Will this change still matter in light of: #5388.

MohamadZeina · 2023-12-07T13:24:06Z

I'll let you know when I get a chance to test this.

I had a look at the peft code and I'm not sure - LORA weights are still added after moving the unet to device, which was causing the issue before. But I'm not sure how peft handles this, it might detect this and put the new weights on the right device.

MohamadZeina · 2023-12-07T16:04:08Z

@sayakpaul in my 1 test, looks like peft fixes the issue - it must be checking for device under the hood and making sure the adapter is on the same device

sayakpaul · 2023-12-07T16:15:19Z

That is very good to know. Feel free to continue to test and let us know about your findings.

sayakpaul · 2023-12-20T04:30:49Z

Is this still a problem? Note that we recently added peft as a dependency to this script.

sayakpaul · 2023-12-24T09:07:20Z

We recently incorporated peft in our training scripts and I believe with that, this issue shouldn't exist. Could you please recheck?

github-actions · 2024-01-19T15:05:19Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Assign device to unet. Resolves huggingface#5897

2f97b67

MohamadZeina mentioned this pull request Dec 5, 2023

LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

Closed

sayakpaul reviewed Dec 5, 2023

View reviewed changes

Remove redundant move unet to device

ede94eb

MohamadZeina and others added 3 commits December 5, 2023 13:40

Fix comment

0f5e281

Merge branch 'main' into fix_text_to_image_lora_device

ab365a2

Merge branch 'main' into fix_text_to_image_lora_device

56f8df2

patrickvonplaten requested a review from sayakpaul December 6, 2023 22:37

github-actions bot added the stale Issues that haven't received updates label Jan 19, 2024

github-actions bot closed this Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assign device to unet. Resolves #5897 #6061

Assign device to unet. Resolves #5897 #6061

MohamadZeina commented Dec 5, 2023

HuggingFaceDocBuilderDev commented Dec 5, 2023

sayakpaul left a comment

MohamadZeina commented Dec 5, 2023

sayakpaul commented Dec 7, 2023

MohamadZeina commented Dec 7, 2023

MohamadZeina commented Dec 7, 2023

sayakpaul commented Dec 7, 2023

sayakpaul commented Dec 20, 2023

sayakpaul commented Dec 24, 2023

github-actions bot commented Jan 19, 2024

Assign device to unet. Resolves #5897 #6061

Assign device to unet. Resolves #5897 #6061

Conversation

MohamadZeina commented Dec 5, 2023

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Dec 5, 2023

sayakpaul left a comment

Choose a reason for hiding this comment

MohamadZeina commented Dec 5, 2023

sayakpaul commented Dec 7, 2023

MohamadZeina commented Dec 7, 2023

MohamadZeina commented Dec 7, 2023

sayakpaul commented Dec 7, 2023

sayakpaul commented Dec 20, 2023

sayakpaul commented Dec 24, 2023

github-actions bot commented Jan 19, 2024