Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign device to unet. Resolves #5897 #6061

Conversation

MohamadZeina
Copy link

What does this PR do?

Resolves issue #5897 - training using the example script examples/text_to_image/train_text_to_image_lora.py fails because parameters and data are on 2 devices, CPU and CUDA. This is caused by the unet LORA weights - I fix it by sending the unet to the accelerator.device after setting up the LORA weights.

Fixes #5897 (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ensure you get expected quality with this change?

@MohamadZeina
Copy link
Author

I've removed the initial unet.to(device) that's now redundant. Regarding quality - I don't have much experience training these models but the performance is roughly on par with what I expect.

I'm training a LORA with rank 16, on "stabilityai/stable-diffusion-2" with the usual resolution of 768. On a 3090 I can fit a batch size of 8, and I'm getting 1.65it/s. It's roughly 4 times faster than a LORA I trained on the same machine on SDXL using another example script that had no issues.

I hope that answers your question but let me know if you need anything else.

@sayakpaul
Copy link
Member

Hi there. Will this change still matter in light of: #5388.

@MohamadZeina
Copy link
Author

I'll let you know when I get a chance to test this.

I had a look at the peft code and I'm not sure - LORA weights are still added after moving the unet to device, which was causing the issue before. But I'm not sure how peft handles this, it might detect this and put the new weights on the right device.

@MohamadZeina
Copy link
Author

@sayakpaul in my 1 test, looks like peft fixes the issue - it must be checking for device under the hood and making sure the adapter is on the same device

@sayakpaul
Copy link
Member

That is very good to know. Feel free to continue to test and let us know about your findings.

@sayakpaul
Copy link
Member

Is this still a problem? Note that we recently added peft as a dependency to this script.

@sayakpaul
Copy link
Member

We recently incorporated peft in our training scripts and I believe with that, this issue shouldn't exist. Could you please recheck?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 19, 2024
@github-actions github-actions bot closed this Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
3 participants