Min3Flow

Min3Flow is a 3-stage text-to-image generation framework. Its structure is modeled after dalle-flow but forgoes the client-server architecture in favor of modularity and configurabilty. The underlying packages have all been stripped down and optimized for inference, taking design inspiration from min-dalle.

Min3Flow vs DALL·E Flow

At a high level, both packages do the same thing in a similar way.

Generate an image from a text prompt using DALL·E-Mega weights
Diffusion refinement with GLID-3-XL
Upsample the 256x256 output images to 1024x1024 with SwinIR

A few thousand feet lower and you'll note that:

Min3Flow uses min-dalle instead of dalle-mini for text-to-image generation. This means the pipeline is entirely PyTorch based, i.e. no flax dependency.
The diffusion library, GLID-3-XL has been heavily refactored and extented. It now functions as standalone module, not just a command line script and supports additional ldm-finetune weights.
Similar to the Glid3XL treatment, SwinIR is no-longer commandline bound. (Kudos to SwinIR_wrapper for the inspiration)

Basic Usage

1. Generate an initial set of images from a text prompt

from min3flow import Min3Flow
mflw = Min3Flow(global_seed=42)

prompt = 'Dali painting of a glider in infrared'
grid_size = 3 # Create a grid of (3,3) images
image = mflw.generate(prompt, grid_size=grid_size)
mflw.show_grid(image)

2. Select and refine your favorite(s) with diffusion

grid_idx = [0,5,8] #  or pass them all: grid_idx=None
img_diff = mflw.diffuse(prompt, image[grid_idx])
mflw.show_grid(img_diff)

3. Select and upsample the images to 1024x1024

grid_idx = [1,8,12,14] #  or pass them all: grid_idx=None
img_up = mflw.upscale(img_diff[grid_idx])
mflw.show_grid(img_up, plot_index=False)

Not a fan of minimalism? Have a look at the Full Configuration in colab

Install

Conda/Mamba

git clone https://github.com/Rypo/min-3-flow.git && cd min-3-flow
conda env create -f environment.yml

Pip

pip install matplotlib jupyter notebook
pip install torch torchvision

# (Glid3XL requirements)
pip install transformers==4.3.1 einops

# CLIP requirements
pip install ftfy regex
pip install git+https://github.com/openai/CLIP.git
# SwinIR requirements
pip install timm
# ldm requirements
pip install pytorch-lightning omegaconf 

git clone https://github.com/Rypo/min-3-flow.git && cd min-3-flow
git clone https://github.com/CompVis/latent-diffusion.git && cd latent-diffusion

pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers

# install latent-diffusion
pip install -e .

cd ..
# install min3flow
pip install -e .

May need to add the following lines to the top of notebooks/scripts if you get a No module named 'ldm' error.

import sys
sys.path.append('latent-diffusion')
sys.path.append('latent-diffusion/src/taming-transformers/')

Results Gallery

🍒Picking Procedure

For each prompt, a batch of 16 images was generated with 7 different configuration (A-G below). The same global seed (42) was used across all prompts and configurations.

A,B,C are images generated with Glid3XL alone (no initial image) and correspond to 3 different diffusion weights (finetune.pt, inpaint.pt, and ongo.pt).
D is images generated by creating an initial image with MinDalle(dtype=float32, supercondition factor=32) and pass that image along with the prompt to Glid3XL(classifier guidance=5.0, steps=200, skip rate=0.5)
E,F,G are images generated with MinDalle alone using float16+supercondition factor 16, float32+super conditionfactor 16, float32+supercondition factor 32

Before upsampling (1+ per prompt) 
[(A, 7), (B, 4), (C, 8), (D, 25), (E, 16), (F, 13), (G, 24)]

After upsampling (1 per prompt) 
[(A, 4), (B, 3), (C, 5), (D, 11), (E, 11), (F, 4), (G, 10)]

A: 'glid3xl-cg5-finetune-200step-0.0skip'
B: 'glid3xl-cg5-inpaint-200step-0.0skip'
C: 'glid3xl-cg5-ongo-200step-0.0skip'
D: 'mindalle-f32-sf32 -> glid3xl-cg5-inpaint-200step-0.5skip'
E: 'mindalle-f16-sf16'
F: 'mindalle-f32-sf16'
G: 'mindalle-f32-sf32'

TODO

Q/A

How to pronounce min-3-flow?

I'm partial to "min-ee-flow" but "min-three-flow" is fair game.

My intention with the l337 style "E" was to sound less like some sort of Minecraft auto clicker (cf. MineFlow).

Why reinvent the wheel?

I found the client-server paradigm to be somewhat limiting in terms of parameter tuning. There are a lot more knobs that can be tuned than are allowed in DALL·E Flow. In persuit of this tunability, I ended up adding more functionality than existed with any of the base packages alone, so it ended up being more than just the sum of its parts.
I couldn't get DocArray to install on my machine. So, why spend an hour debugging when you can spend a month building your own!

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github		.github
min3flow		min3flow
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
min3flow.ipynb		min3flow.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Min3Flow

Min3Flow vs DALL·E Flow

Basic Usage

1. Generate an initial set of images from a text prompt

2. Select and refine your favorite(s) with diffusion

3. Select and upsample the images to 1024x1024

Install

Conda/Mamba

Pip

Results Gallery

🍒Picking Procedure

TODO

Q/A

About

Releases

Packages

Languages

License

Rypo/min-3-flow

Folders and files

Latest commit

History

Repository files navigation

Min3Flow

Min3Flow vs DALL·E Flow

Basic Usage

1. Generate an initial set of images from a text prompt

2. Select and refine your favorite(s) with diffusion

3. Select and upsample the images to 1024x1024

Install

Conda/Mamba

Pip

Results Gallery

🍒Picking Procedure

TODO

Q/A

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages