Let's easily fine-tuning a pre-trained Stable Diffusion XL using dataset-maker
and LoRA!
Fashion-Product-Generator is a finetuned text-to-image generative model with a custom dataset collected from KREAM, one of the best online-resell market in Korea. Have fun creating realistic, high-quality fashion items!
Hugging Face Repository 🤗
- Model: hahminlew/sdxl-kream-model-lora-2.0 | Previous version: hahminlew/sdxl-kream-model-lora
- Dataset: hahminlew/kream-product-blip-captions
*Generate various creative products through prompt engineering!
Prompts
outer, The Nike x Balenciaga Down Jacket Black, a photography of a black down jacket with a logo on the chest.
top, (W) Moncler x Adidas Slip Hoodie Dress Cream, a photography of a cream dress and a hood on.
bottom, Supreme Animal Print Baggy Jean Washed Indigo - 23FW, a photography of a dark blue jean with an animal printing on.
outer, The North Face x Supreme White Label Nuptse Down Jacket Cream, a photography of a white puffer jacket with a red box logo on the front.
top, The Supreme x Stussy Oversized Cotton Black Hoodie, a photography of a black shirt with a hood on and a logo on the chest.
bottom, IAB Studio x Stussy Tie-Dye Sweat Wooven Shorts, a photography of a dye short pants with a logo.
- python == 3.11
- xFormers
- PyTorch == 2.0.1
- Hugging Face 🤗: diffusers, transformers, datasets
I tested the conda environments on Linux, CUDA version 12.0, and NVIDIA Drivier Version 525.125.06.
*Please refer to environment.yml for more details.
cd fashion-product-generator
conda env create -f environment.yml
conda activate fpg
pip install git+https://github.com/huggingface/diffusers
KREAM Product Dataset Examples Collected by dataset-maker
dataset-maker
is an example for a custom data collection tool to finetune the Stable Diffusion. It consists of web crawler and BLIP image captioning module.
KREAM Product Blip Captions Dataset is now available in Hugging Face 🤗.
from datasets import load_dataset
dataset = load_dataset("hahminlew/kream-product-blip-captions", split="train")
sample = dataset[0]
display(sample["image"].resize((256, 256)))
print(sample["text"])
outer, The North Face 1996 Eco Nuptse Jacket Black, a photography of the north face black down jacket
- Move dataset.json file into desired save directory for KREAM Product Dataset.
mv ./dataset.json [/path/to/save]
cd dataset-maker
- Run
download_KREAM.py
.
python download_KREAM.py --save_dir [/path/to/save]
- Run
BLIP_captioning.py
.
CUDA_LAUNCH_BLOCKING=1 python BLIP_captioning.py --dataset_dir [/path/to/dataset] --use_condition --text_condition 'a photography of'
BLIP captioning results will be saved in /path/to/save/dataset_BLIP.json
cd dataset-maker
- Inspect your desired website and slightly modify
webCrawler.py
.
*Please exercise caution when web crawling. Make sure to adhere to the website's crawling policies, which can be found in the '/robots.txt'.
- Run a modified
webCrawler.py
.
python webCrawler.py
- Run
BLIP_captioning.py
.
CUDA_LAUNCH_BLOCKING=1 python BLIP_captioning.py --dataset_dir [/path/to/dataset] --use_condition --text_condition 'a photography of'
I utilized Hugging Face Diffusers Text-to-Image Examples
for finetuning a pre-trained Stable Diffusion XL with LoRA under 4 NVIDIA GeForce RTX 3090 GPUs.
- Memory-Usage: approximately 65GB
- Training-Time: approximately 15h for 10 epochs
cd finetuning
accelerate config default
huggingface-cli login
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="hahminlew/kream-product-blip-captions"
CUDA_LAUNCH_BLOCKING=1 accelerate launch train_text_to_image_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--pretrained_vae_model_name_or_path=$VAE_NAME \
--dataset_name=$DATASET_NAME --caption_column="text" \
--resolution=1024 --random_flip \
--train_batch_size=1 \
--num_train_epochs=10 --checkpointing_steps=1000 \
--learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
--mixed_precision="fp16" \
--seed=42 \
--output_dir="sdxl-kream-model-lora" \
--validation_prompt="outer, The Nike x Balenciaga down jacket black, a photography of a black down jacket with a logo on the chest" --report_to="wandb" \
--push_to_hub
Or simply run:
sudo chmod +x run.sh
./run.sh
*Make sure you have Hugging Face and wandb account. You should create a directory and personal tokens for Hugging Face. Also, please check your personal API keys for wandb.
SDXL-KREAM-Model-LoRA-2.0 is now available in Hugging Face 🤗.
python inference.py --prompt 'outer, The Nike x Balenciaga Down Jacket Black, a photography of a black down jacket with a logo on the chest.' --img_name example.png
Usage
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.load_lora_weights("hahminlew/sdxl-kream-model-lora-2.0")
prompt = "outer, The Nike x Balenciaga Down Jacket Black, a photography of a black down jacket with a logo on the chest."
image = pipe(prompt, num_inference_steps=45, guidance_scale=7.5).images[0]
image.save("example.png")
Parameter Descriptions
num_inference_steps
: int, Number of diffusion stepsguidance_scale
: float, How similar the generated image will be to the prompt, 1 <=guidance_scale
<= 50
- BLIP image captioning and BLIP Hugging Face Demo
- Hugging Face Dataset Creation Tutorial
- LoRA: Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning
- Using LoRA for Efficient Stable Diffusion Fine-Tuning
- Hugging Face Diffusers Text-to-Image Examples
- Finetuning Stable Diffusion from Lambda Labs ML
If you use KREAM Product Dataset, please cite it as:
@misc{lew2023kream,
author = {Lew, Hah Min},
title = {KREAM Product BLIP Captions},
year={2023},
howpublished= {\url{https://huggingface.co/datasets/hahminlew/kream-product-blip-captions/}}
}