Skip to content

Latest commit

 

History

History
60 lines (38 loc) · 2.01 KB

File metadata and controls

60 lines (38 loc) · 2.01 KB

Text Guided Style Transfer

This is my attempt at implementing Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer .

Motivation

I got interested in the possibility of manipulating pretrained text2img models for video edition. I googled CLIP-based Style Transfer and I stumbled upon this paper, which didn't have an open implementation so I decided to do it myself.

Setup project

Clone submodules:

git clone https://github.com/openai/CLIP
git clone https://github.com/ouhenio/guided-diffusion.git

Install submodules dependencies:

pip install -e ./CLIP & pip install -e ./guided-diffusion

Download the unconditional diffusion model (weights 2.06GB):

wget -O unconditional_diffusion.pt https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt

Usage

Sadly, the usage interface is pretty lacking:

python main.py

To try different styles, hyperparameters, and images, edit these lines in main.py:

139: guidance prompt
216: loss hyperparameters
155: initial image

Example Results

Image Prompt Global Loss Directional Loss Feature Loss MSE Loss ZeCon Loss
portrait None None None None None
cubism 20000 15000 50 3000 10
3d render in the style of Pixar 5000 5000 100 10000 500

Final thoughts

I've found that this method kinda works but it is very sensitive to hyperparams, which makes it frustrating to use.

Table 5 of the paper makes me confident that the authors had the same issue.