LoRA can not train text encoder in "train_text_to_image_lora_sdxl.py"? #5012

Kevin7720 · 2023-09-13T09:17:19Z

Describe the bug

train_text_to_image_lora.py "--train_text_encoder" is work, but train_text_to_image_lora_sdxl.py "--train_text_encoder" is not work.

Reproduction

accelerate launch train_dreambooth_lora_sdxl.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --pretrained_vae_model_name_or_path="stabilityai/sdxl-vae" --output_dir="traintextencorder" --resolution=512 --train_batch_size=2 --gradient_accumulation_steps=1 --learning_rate=1e-5 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=3000 --instance_data_dir="dog" --instance_prompt="A photo of a sks dog." --train_text_encoder

Logs

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by 
making sure all `forward` function outputs participate in calculating loss. 
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 1: 88 89 90 91 92 93 94 95
 In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error

System Info

diffusers version: 0.21.0.dev0
Platform: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.14.0a0+44dac51 (True)
Huggingface_hub version: 0.16.4
Transformers version: 4.33.1
Accelerate version: 0.22.0
xFormers version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@williamberman, @patrickvonplaten, and @sayakpaul

The text was updated successfully, but these errors were encountered:

sayakpaul · 2023-09-13T09:33:11Z

Are you launching this from a machine having multiple GPUs?

Kevin7720 · 2023-09-14T03:31:00Z

@sayakpaul yes, I am.

Kevin7720 · 2023-09-14T03:37:47Z

I had 4 GPUs.

This is my accelerate config setting.

sayakpaul · 2023-09-14T06:00:17Z

Yeah I remember there was a similar issue reported a while back. I think the limitation is to only use a machine with a single GPU for now.

Kevin7720 · 2023-09-18T02:20:20Z

Thank you for your help, it is working.

sayakpaul · 2023-09-18T05:32:05Z

Closing then.

leyangjin · 2023-11-07T02:06:18Z

@sayakpaul Hi, is this issue sovled now? I am still encountered with the same issue when I use train_text_to_image_lora_sdxl.py with 4 GPUs.

sayakpaul · 2023-11-07T02:10:15Z

Does this help?

#5355

Kevin7720 · 2023-11-07T02:27:15Z

Yeah I remember there was a similar issue reported a while back. I think the limitation is to only use a machine with a single GPU for now.

On single GPU is working. You can try it.

@sayakpaul Hi, is this issue sovled now? I am still encountered with the same issue when I use train_text_to_image_lora_sdxl.py with 4 GPUs.

leyangjin · 2023-11-07T07:09:28Z

Does this help?

#5355

Hi @sayakpaul, thank you for your help!. Acutally The version of train_text_to_image_lora_sdxl.py has adapted the changes in this PR. However, the problem still exists.

leyangjin · 2023-11-07T07:10:29Z

Yeah I remember there was a similar issue reported a while back. I think the limitation is to only use a machine with a single GPU for now.

On single GPU is working. You can try it.

@sayakpaul Hi, is this issue sovled now? I am still encountered with the same issue when I use train_text_to_image_lora_sdxl.py with 4 GPUs.

Hi, thank you for your help! Sorry that since my dataset size is quite large, I may have to use multiple GPUs

sayakpaul · 2023-11-07T07:12:10Z

Then I am not sure what could be the problem here.

leyangjin · 2023-11-07T08:29:05Z

@patrickvonplaten Hi, could you please help me with this? I think the bug has been here for several months. Thank you!

sayakpaul · 2023-11-07T08:44:32Z

@leyangjin could you try out some suggestions from this thread too?

Sorry for being a bit haphazard here.

Kevin7720 added the bug Something isn't working label Sep 13, 2023

sayakpaul closed this as completed Sep 18, 2023

Dvirbeno mentioned this issue Jan 4, 2024

Gradient Accumulation - Multi-GPU training of SDXL-LoRA #6457

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA can not train text encoder in "train_text_to_image_lora_sdxl.py"? #5012

LoRA can not train text encoder in "train_text_to_image_lora_sdxl.py"? #5012

Kevin7720 commented Sep 13, 2023

sayakpaul commented Sep 13, 2023

Kevin7720 commented Sep 14, 2023

Kevin7720 commented Sep 14, 2023

sayakpaul commented Sep 14, 2023

Kevin7720 commented Sep 18, 2023

sayakpaul commented Sep 18, 2023

leyangjin commented Nov 7, 2023 •

edited

Loading

sayakpaul commented Nov 7, 2023

Kevin7720 commented Nov 7, 2023

leyangjin commented Nov 7, 2023

leyangjin commented Nov 7, 2023

sayakpaul commented Nov 7, 2023

leyangjin commented Nov 7, 2023

sayakpaul commented Nov 7, 2023

LoRA can not train text encoder in "train_text_to_image_lora_sdxl.py"? #5012

LoRA can not train text encoder in "train_text_to_image_lora_sdxl.py"? #5012

Comments

Kevin7720 commented Sep 13, 2023

Describe the bug

Reproduction

Logs

System Info

Who can help?

sayakpaul commented Sep 13, 2023

Kevin7720 commented Sep 14, 2023

Kevin7720 commented Sep 14, 2023

sayakpaul commented Sep 14, 2023

Kevin7720 commented Sep 18, 2023

sayakpaul commented Sep 18, 2023

leyangjin commented Nov 7, 2023 • edited Loading

sayakpaul commented Nov 7, 2023

Kevin7720 commented Nov 7, 2023

leyangjin commented Nov 7, 2023

leyangjin commented Nov 7, 2023

sayakpaul commented Nov 7, 2023

leyangjin commented Nov 7, 2023

sayakpaul commented Nov 7, 2023

leyangjin commented Nov 7, 2023 •

edited

Loading