-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fine-tuning with LoRA (text2image example) #2002
Conversation
…add_lora_fine_tuning
With the following: export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch --gpu_ids="0," \
./train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME --caption_column="text" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--save_sample_prompt="cute Sundar Pichai creature" --report_to="wandb" \ still leading to:
With
The above experiments were run on a single T4 machine. On a V100 with xformers, it works. Logs will be here: https://wandb.ai/sayakpaul/stable_diffusion_ft_lora/runs/0b88cwxc. |
When I tried enabling mixed-precision on T4, it led to:
|
@patil-suraj the training is completed and the results look good: https://wandb.ai/sayakpaul/stable_diffusion_ft_lora/reports/LoRA-fine-tuning-of-text2image--VmlldzozMzUxNjI5 Let me know if it makes sense to continue to work on this PR and add LoRA support formally to our text2image fine-tuning script. Happy to take care of it :) Update: Talked to Suraj offline. I will continue working on this PR and let y'all know (@patrickvonplaten @patil-suraj) when it's ready for reviews. |
Thanks a lot for working on this! Feel free to continue on the PR. We could add the |
Co-authored-by: Simo Ryu <cloneofsimo@gmail.com>
Things seem to be working on both T4 and V100. My command: export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch \
train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME --caption_column="text" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--enable_xformers_memory_efficient_attention \
--validation_prompt="cute Sundar Pichai creature" --report_to="wandb" \
--output_dir="sd-model-finetuned-lora-v100" \
--push_to_hub && sudo shutdown now The final weights will be pushed to https://huggingface.co/sayakpaul/sd-model-finetuned-lora-v100/tree/main and an experimentation run is available here: https://wandb.ai/sayakpaul/text2image-fine-tune/runs/782txylu (currently running). Once these are done, I will update the appropriate sections in the README. |
Closing this PR since the merge conflicts are a little too brutal to resolve. I will create a fresh PR. |
Most of it is as same as #1884. I guess the only script that needs reviewing is
examples/text_to_image/train_text_to_image_lora.py
.