Skip to content

Easy demo for finetuning a pre-trained Stable Diffusion XL with LoRA using the collected fashion dataset from scratch.

License

Notifications You must be signed in to change notification settings

hahminlew/fashion-product-generator

Repository files navigation

Fashion-Product-Generator

Open In Colab

Let's easily fine-tuning a pre-trained Stable Diffusion XL using dataset-maker and LoRA!

Fashion-Product-Generator is a finetuned text-to-image generative model with a custom dataset collected from KREAM, one of the best online-resell market in Korea. Have fun creating realistic, high-quality fashion items!

Highlights

Hugging Face Repository 🤗

*Generate various creative products through prompt engineering!

img

Prompts

  • outer, The Nike x Balenciaga Down Jacket Black, a photography of a black down jacket with a logo on the chest.
  • top, (W) Moncler x Adidas Slip Hoodie Dress Cream, a photography of a cream dress and a hood on.
  • bottom, Supreme Animal Print Baggy Jean Washed Indigo - 23FW, a photography of a dark blue jean with an animal printing on.
  • outer, The North Face x Supreme White Label Nuptse Down Jacket Cream, a photography of a white puffer jacket with a red box logo on the front.
  • top, The Supreme x Stussy Oversized Cotton Black Hoodie, a photography of a black shirt with a hood on and a logo on the chest.
  • bottom, IAB Studio x Stussy Tie-Dye Sweat Wooven Shorts, a photography of a dye short pants with a logo.

Dependencies

  • python == 3.11
  • xFormers
  • PyTorch == 2.0.1
  • Hugging Face 🤗: diffusers, transformers, datasets

I tested the conda environments on Linux, CUDA version 12.0, and NVIDIA Drivier Version 525.125.06.

*Please refer to environment.yml for more details.

cd fashion-product-generator

conda env create -f environment.yml

conda activate fpg

pip install git+https://github.com/huggingface/diffusers

dataset-maker Instructions

KREAM Product Dataset Examples Collected by dataset-maker

dataset-maker is an example for a custom data collection tool to finetune the Stable Diffusion. It consists of web crawler and BLIP image captioning module.

KREAM Product Dataset from Hugging Face

KREAM Product Blip Captions Dataset is now available in Hugging Face 🤗.

from datasets import load_dataset

dataset = load_dataset("hahminlew/kream-product-blip-captions", split="train")
sample = dataset[0]
display(sample["image"].resize((256, 256)))
print(sample["text"])

png

outer, The North Face 1996 Eco Nuptse Jacket Black, a photography of the north face black down jacket

Download KREAM Product Dataset from Scratch

  1. Move dataset.json file into desired save directory for KREAM Product Dataset.
mv ./dataset.json [/path/to/save]

cd dataset-maker

  1. Run download_KREAM.py.
python download_KREAM.py --save_dir [/path/to/save]
  1. Run BLIP_captioning.py.
CUDA_LAUNCH_BLOCKING=1 python BLIP_captioning.py --dataset_dir [/path/to/dataset] --use_condition --text_condition 'a photography of'

BLIP captioning results will be saved in /path/to/save/dataset_BLIP.json

Try Your Own Dataset Creation

cd dataset-maker

  1. Inspect your desired website and slightly modify webCrawler.py.

*Please exercise caution when web crawling. Make sure to adhere to the website's crawling policies, which can be found in the '/robots.txt'.

  1. Run a modified webCrawler.py.
python webCrawler.py
  1. Run BLIP_captioning.py.
CUDA_LAUNCH_BLOCKING=1 python BLIP_captioning.py --dataset_dir [/path/to/dataset] --use_condition --text_condition 'a photography of'

Finetuning Stable Diffusion Instructions

I utilized Hugging Face Diffusers Text-to-Image Examples for finetuning a pre-trained Stable Diffusion XL with LoRA under 4 NVIDIA GeForce RTX 3090 GPUs.

  • Memory-Usage: approximately 65GB
  • Training-Time: approximately 15h for 10 epochs

cd finetuning

accelerate config default

huggingface-cli login

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="hahminlew/kream-product-blip-captions"

CUDA_LAUNCH_BLOCKING=1 accelerate launch train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE_NAME \
  --dataset_name=$DATASET_NAME --caption_column="text" \
  --resolution=1024 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=10 --checkpointing_steps=1000 \
  --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --seed=42 \
  --output_dir="sdxl-kream-model-lora" \
  --validation_prompt="outer, The Nike x Balenciaga down jacket black, a photography of a black down jacket with a logo on the chest" --report_to="wandb" \
  --push_to_hub

Or simply run:

sudo chmod +x run.sh
./run.sh

*Make sure you have Hugging Face and wandb account. You should create a directory and personal tokens for Hugging Face. Also, please check your personal API keys for wandb.

Inference

SDXL-KREAM-Model-LoRA-2.0 is now available in Hugging Face 🤗.

python inference.py --prompt 'outer, The Nike x Balenciaga Down Jacket Black, a photography of a black down jacket with a logo on the chest.' --img_name example.png

Usage

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.load_lora_weights("hahminlew/sdxl-kream-model-lora-2.0")

prompt = "outer, The Nike x Balenciaga Down Jacket Black, a photography of a black down jacket with a logo on the chest."

image = pipe(prompt, num_inference_steps=45, guidance_scale=7.5).images[0]
image.save("example.png")

Parameter Descriptions

  • num_inference_steps: int, Number of diffusion steps
  • guidance_scale: float, How similar the generated image will be to the prompt, 1 <= guidance_scale <= 50

References

Citation

If you use KREAM Product Dataset, please cite it as:

@misc{lew2023kream,
      author = {Lew, Hah Min},
      title = {KREAM Product BLIP Captions},
      year={2023},
      howpublished= {\url{https://huggingface.co/datasets/hahminlew/kream-product-blip-captions/}}
} 

About

Easy demo for finetuning a pre-trained Stable Diffusion XL with LoRA using the collected fashion dataset from scratch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published