LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

MhDang · 2023-11-22T17:01:37Z

Describe the bug

I tried to experiment with LoRA training following examples/text_to_image/README.md#training-with-lora.

However, I got the error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) on line 801.

The same issue did not occur when I was trying the the same example (with the implementation at that time) months ago. I noticed there were several commits after that.

I followed the README.md for installing packages and the non-LoRA training works well.

Thank you very much!

Reproduction

Install packages following README.md:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Then cd in the folder examples/text_to_image and run

pip install -r requirements.txt

in directory examples/text_to_image run the following

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME --caption_column="text" \
  --resolution=512 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=100 --checkpointing_steps=5000 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir="sd-pokemon-model-lora" \
  --validation_prompt="cute dragon creature" --report_to="wandb"

Logs

11/22/2023 08:36:20 - INFO - __main__ - ***** Running training *****
11/22/2023 08:36:20 - INFO - __main__ -   Num examples = 833
11/22/2023 08:36:20 - INFO - __main__ -   Num Epochs = 100
11/22/2023 08:36:20 - INFO - __main__ -   Instantaneous batch size per device = 1
11/22/2023 08:36:20 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
11/22/2023 08:36:20 - INFO - __main__ -   Gradient Accumulation steps = 1
11/22/2023 08:36:20 - INFO - __main__ -   Total optimization steps = 83300
Steps:   0%|                                                                                                                                | 0/83300 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./repo/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 975, in <module>
    main()
  File "./repo/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 801, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 1075, in forward
    sample, res_samples = downsample_block(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 1160, in forward
    hidden_states = attn(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/transformer_2d.py", line 375, in forward
    hidden_states = block(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention.py", line 258, in forward
    attn_output = self.attn1(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 1211, in __call__
    query = attn.to_q(hidden_states, *args)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/lora.py", line 433, in forward
    out = super().forward(hidden_states) + (scale * self.lora_layer(hidden_states))
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/lora.py", line 220, in forward
    down_hidden_states = self.down(hidden_states.to(dtype))
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapp
er_CUDA_mm)

System Info

diffusers version: 0.24.0.dev0
Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.31
Python version: 3.9.18
PyTorch version (GPU?): 2.0.1+cu117 (True)
Huggingface_hub version: 0.19.4
Transformers version: 4.35.2
Accelerate version: 0.24.1
xFormers version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@sayakpaul @patrickvonplaten

The text was updated successfully, but these errors were encountered:

MhDang · 2023-11-22T17:07:21Z

If the current version is still in development, would it also be possible to point to any previsous working version?

zanedamico · 2023-11-22T18:30:19Z

Also having the same issue, after successfully training last week

wellCh4n · 2023-11-23T09:38:35Z

I can reproduce the issue in my case too.
But I've looked at the lora script commits, and there was a recent one with big changes, and I used the previous commit, which runs fine in my case.

Download this and replace ./diffusers/examples/text_to_image/train_text_to_image_lora.py
This is a temporary solution, sadly I'm not familiar with this.

IceClear · 2023-11-23T16:39:38Z

I think the reason is that the Lora parameters is added to the unet after unet is send to GPU. So the LORA is actually on CPU, leading to the error. A simple way to fix this typo is to first add Lora to unet and then send them together to GPU:

abetatos · 2023-11-24T08:17:30Z

Same behaviour here, falling back to the version @wellCh4n provided solved the problem.

NEhlen · 2023-11-25T23:36:53Z

I think the reason is that the Lora parameters is added to the unet after unet is send to GPU. So the LORA is actually on CPU, leading to the error. A simple way to fix this typo is to first add Lora to unet and then send them together to GPU:

Can confirm this fixes the error for me, however at least on Colab with a T4 runtime I then get a "Expected is_sm80 || is_sm90 to be true, but got false." error message when the script tries to backpropagate the loss. Not sure if this is an issue with the new script or some compatibility issue with the CUDA drivers in the Colab though.

sayakpaul · 2023-11-27T03:34:43Z

This seems like a setup problem to me as I am unable to reproduce it, even on a Google Colab:
#5004 (comment)

hanweikung · 2023-11-28T10:24:31Z

I got the same error. However, reverting to the previous version, as @wellCh4n suggested, resolved the issue.

maliozer · 2023-12-03T21:57:20Z

same issue following for fix

sayakpaul · 2023-12-04T02:31:59Z

I am gonna have to repeat myself here:

#5897 (comment)

zanedamico · 2023-12-04T03:29:27Z

@sayakpaul Is there anything we can do to help you reproduce this issue? Seems significant as multiple people with different setups have encountered the same issue. Otherwise we're forced to keep using this older version indefinently.

sayakpaul · 2023-12-04T03:40:20Z

A Colab notebook would be nice because that's the easiest to reproduce. As already indicated here, I was not able to reproduce at all: #5897 (comment).

And I am quite sure #5388 will resolve these problems for good.

MohamadZeina · 2023-12-05T12:44:04Z

Hopefuly this is fixed when moving to PEFT - in the meantime if you don't want to revert to an older version, I had the same issue, and fixed it by adding 1 line:

unet.to(accelerator.device, dtype=weight_dtype)

At my line 539, immediately after the LORA weights are added, and outside the loop:

    # Accumulate the LoRA params to optimize.
    unet_lora_parameters.extend(attn_module.to_q.lora_layer.parameters())
    unet_lora_parameters.extend(attn_module.to_k.lora_layer.parameters())
    unet_lora_parameters.extend(attn_module.to_v.lora_layer.parameters())
    unet_lora_parameters.extend(attn_module.to_out[0].lora_layer.parameters())

unet.to(accelerator.device, dtype=weight_dtype)

Thanks to @IceClear and others that found that some of the unet was on the wrong device.

sayakpaul · 2023-12-05T12:47:08Z

If you want to open a PR fixing it, more than happy to merge :)

MohamadZeina · 2023-12-05T13:13:25Z

@sayakpaul Thank you - I've opened #6061, let me know if it needs any modification

maliozer · 2023-12-08T16:18:31Z

@sayakpaul Thank you - I've opened #6061, let me know if it needs any modification

Is this still on the progress ?

github-actions · 2024-01-03T15:06:04Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

MhDang added the bug Something isn't working label Nov 22, 2023

patrickvonplaten assigned sayakpaul Nov 27, 2023

MohamadZeina added a commit to MohamadZeina/diffusers that referenced this issue Dec 5, 2023

Assign device to unet. Resolves huggingface#5897

2f97b67

MohamadZeina mentioned this issue Dec 5, 2023

Assign device to unet. Resolves #5897 #6061

Closed

6 tasks

kopyl mentioned this issue Dec 7, 2023

Lora example does not seem to train anyting #6092

Closed

github-actions bot added the stale Issues that haven't received updates label Jan 3, 2024

github-actions bot closed this as completed Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

MhDang commented Nov 22, 2023

MhDang commented Nov 22, 2023

zanedamico commented Nov 22, 2023 •

edited

Loading

wellCh4n commented Nov 23, 2023 •

edited

Loading

IceClear commented Nov 23, 2023 •

edited

Loading

abetatos commented Nov 24, 2023

NEhlen commented Nov 25, 2023

sayakpaul commented Nov 27, 2023

hanweikung commented Nov 28, 2023

maliozer commented Dec 3, 2023

sayakpaul commented Dec 4, 2023

zanedamico commented Dec 4, 2023

sayakpaul commented Dec 4, 2023

MohamadZeina commented Dec 5, 2023

sayakpaul commented Dec 5, 2023

MohamadZeina commented Dec 5, 2023

maliozer commented Dec 8, 2023

github-actions bot commented Jan 3, 2024

LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

Comments

MhDang commented Nov 22, 2023

Describe the bug

Reproduction

Logs

System Info

Who can help?

MhDang commented Nov 22, 2023

zanedamico commented Nov 22, 2023 • edited Loading

wellCh4n commented Nov 23, 2023 • edited Loading

IceClear commented Nov 23, 2023 • edited Loading

abetatos commented Nov 24, 2023

NEhlen commented Nov 25, 2023

sayakpaul commented Nov 27, 2023

hanweikung commented Nov 28, 2023

maliozer commented Dec 3, 2023

sayakpaul commented Dec 4, 2023

zanedamico commented Dec 4, 2023

sayakpaul commented Dec 4, 2023

MohamadZeina commented Dec 5, 2023

sayakpaul commented Dec 5, 2023

MohamadZeina commented Dec 5, 2023

maliozer commented Dec 8, 2023

github-actions bot commented Jan 3, 2024

zanedamico commented Nov 22, 2023 •

edited

Loading

wellCh4n commented Nov 23, 2023 •

edited

Loading

IceClear commented Nov 23, 2023 •

edited

Loading