Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance, PDF

by Dongmin Park¹, Sebin Kim², Taehong Moon¹, Minkyu Kim¹, Kangwook Lee^1,3, Jaewoong Cho¹.

¹ KRAFTON AI, ² Seoul National University, ³ University of Wisconsin-Madison

🔎Overview

Rare-to-frequent (R2F) is a powerful training-free framework that can unlock the compositional generation power of SOTA text-to-image diffusion models (e.g., SDXL, SD3, IterComp, and FLUX) by leveraging SOTA LLMs (e.g., GPT-4o and LLaMA3) as the rare concept identificator and frequent concept guider throughout the diffusion sampling steps
R2F is flexible to an arbitrary combination of diffusion backbones and LLM architectures
R2F can also be seamlessly integrated with region-guided diffusion approaches, yielding more controllable image synthesis
- First work to apply cross-attention control on SD3!
Fast 4-step inference with FLUX-schenell integration!

🖼Examples

While SOTA pre-trained T2I models (e.g., SD3 and FLUX) and an LLM-grounded T2I approach (e.g., RPG) struggle to generate images from prompts with rare compositions of concepts (= attribute + object ), R2F exhibits superior composition results
This may provide a better image generation experience for user creators (e.g., designing a new character with unprecedented attributes)
More generated images are in images/ folder.

R2F (Ours)	FLUX-schnell	SD3	RPG

Prompt: A furry frog warrior

Prompt: A mustachioed squirrel is holding an ax-shaped guitar on a stage

Prompt: A beautiful wigged octopus is juggling three star-shaped apples

Prompt: A red dragon and a unicorn made of diamond rollerblading through a neon lit cityscape

💡Why R2F works?

1. Theoretical observation

Once a target rare distribution (deep blue) is difficult to estimate by a model, the score-interpolated distribution (sky blue), created through the interpolation of the estimated distribution (red) and the relevant yet frequent distribution (green), is much closer to the actual target.
In other words, the Wasserstein distance of the score-interpolated distribution (sky blue) to the target (deep blue) is smaller than that of the original estimated distribution (red).

2. Empirical observation

Once we generate a rare composition of two concepts (flower-patterned and animal), SD3's naive inferences (red line) tend to be inaccurate when the composition becomes rarer (animal classes rarely appear on the LAION dataset).
However, when we guide the inference with a relatively frequent composition (flower-patterned bear, which is easily generated as bear doll) at the early sampling steps and then turn back to the original prompt, the generation quality is significantly enhanced (blue line).

Therefore, we can unlock the power of diffusion models on rare concepts (even in the tail distribution) !!!

🧪How to Run

1. Playground

from R2F_Diffusion_xl import R2FDiffusionXLPipeline
from R2F_Diffusion_sd3 import R2FDiffusion3Pipeline
from R2F_Diffusion_flux import R2FFluxPipeline

from diffusers import DPMSolverMultistepScheduler

from gpt.mllm import GPT4_Rare2Frequent, LLaMA3_Rare2Frequent
import torch

api_key = "YOUR_API_KEY"

model = "itercomp"
if model == 'sd3':
    pipe = R2FDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium", revision="refs/pr/26")
elif model == "sdxl":
    pipe = R2FDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
elif model == "flux":
    pipe = R2FFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16) # In R2F, we do experiment on FLUX.1-schnell which it requires 4 sampling steps.
elif model == "itercomp":
    pipe = R2FDiffusionXLPipeline.from_pretrained("comin/IterComp",torch_dtype=torch.float16, use_safetensors=True)
pipe.to("cuda")

# Demo
prompt= 'A hairy frog'

# Get r2f prompt from LLMs
llm = "gpt4o"
if llm == "gpt4o":
    r2f_prompt = GPT4_Rare2Frequent(prompt, key=api_key)
elif llm == "llama3.1":
    r2f_prompt = LLaMA3_Rare2Frequent(prompt, model_id="meta-llama/Llama-3.1-8B-Instruct")
print(r2f_prompt)

image = pipe(
    r2f_prompts = r2f_prompt,
    seed = 42,# random seed
).images[0]
image.save(f"{prompt}_test.png")

2. Running R2F on Benchmark Datasets

### Get r2f_prompts from GPT-4o/LLaMA
cd gpt
bash get_r2f_response.sh 

### Generate images
cd ../script/
bash inference_r2f.sh

3. Running R2F+ on Benchmark Datasets

### Get r2fplus_prompts from GPT-4o/LLaMA
cd gpt
bash get_r2fplus_response.sh 

### Generate images
cd ../script/
bash inference_r2fplus.sh

📊RareBench

A new evaluation benchmark consisting of prompts with diverse and rare concepts
See test/original_prompt/rarebench/ folder.
All the r2f_prompts generated by GPT-4o are in test/r2f_prompt/ folder.

✔Set Environment

git clone 
cd Rare-to-Frequent
conda create -n R2F python==3.9
conda activate r2f
pip install -r requirements.txt

📖Citation

@article{park2024rare,
  title={Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance},
  author={Park, Dongmin and Kim, Sebin and Moon, Taehong and Kim, Minkyu and Lee, Kangwook and Cho, Jaewoong},
  journal={arXiv preprint arXiv:2410.22376},
  year={2024}
}

Acknowledgements

Our R2F is a general LLM-grounded T2I generation framework built on several solid works. Thanks to RPG, LMD, SAM, and diffusers for their wonderful work and codebase!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance, PDF

🔎Overview

🖼Examples

💡Why R2F works?

1. Theoretical observation

2. Empirical observation

🧪How to Run

1. Playground

2. Running R2F on Benchmark Datasets

3. Running R2F+ on Benchmark Datasets

📊RareBench

✔Set Environment

📖Citation

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance, PDF

🔎Overview

🖼Examples

💡Why R2F works?

1. Theoretical observation

2. Empirical observation

🧪How to Run

1. Playground

2. Running R2F on Benchmark Datasets

3. Running R2F+ on Benchmark Datasets

📊RareBench

✔Set Environment

📖Citation

Acknowledgements