GridShow: Omni Visual Generation

The official implementation of work "GridShow: Omni Visual Generation".

Overview

GRID introduces a novel paradigm that reframes visual generation tasks as grid layout problems. Built upon FLUX.1 architecture, our framework transforms temporal sequences into grid layouts, enabling image generation models to process visual sequences holistically. This approach achieves remarkable efficiency and versatility across diverse visual generation tasks.

Key Features

Efficient Inference: up to 35× faster inference speeds compared to specialized models
Resource Efficient: Requires <1/1000 of computational resources
Versatile Applications: Supports Text-to-Video, Image-to-Video, Multi-view Generation, and more
Preserved Capabilities: Maintains strong image generation performance while expanding functionality

Framework

Results

Due to upload limits of github, we compress our size from 1024×1024 to 256×256, to see full size of each please refer to:

vid1 vid2 vid3 vid4 vid5 vid6 vid7

From left to right: input cat video, and edited results of fox, tiger, and red panda transformations.

Installation

Requirements

Python >= 3.10
NVIDIA GPU with 24GB+ VRAM
CUDA 11.6+
PyTorch >= 1.12

git clone https://github.com/[username]/GRID.git
cd GRID
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Data Preparation Steps:

1. Initial Directory Structure:

source/
├── train/
│   ├── sequence1/
│   │   └── frame_{1..n}.jpg  # Sequential frames 
│   └── sequence2/
│       └── frame_{1..n}.jpg
└── val/
    └── ...

2. Grid Layout Generation:

python tools/concat.py \\
    --input_dir source/train \\
    --output_dir vidgrid \\
    --grid_rows 4 \\
    --grid_cols 6 \\
    --frames_per_grid 24

Data Structure:

vidgrid/
├── vid1.jpg  # 4x6 grid containing 24 frames
└── vid2.jpg  # Each .jpg is a complete sequence

3. Caption Generation:

mkdir -p models

# Download GLM-4V-9B weights
# Option 1: From ModelScope
wget https://modelscope.cn/models/ZhipuAI/glm-4v-9b/resolve/main/pytorch_model.bin -O models/glm-4v-9b.bin

# Option 2: From Hugging Face
wget https://huggingface.co/THUDM/glm-4v-9b/resolve/main/pytorch_model.bin -O models/glm-4v-9b.bin

# Option 3: From WiseModel
wget https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B/resolve/main/pytorch_model.bin -O models/glm-4v-9b.bin

python tools/caption_glm.py

Final Training Data Structure:

vidgrid/
├── vid1.jpg  # Grid image
├── vid1.txt  # Corresponding caption
├── vid2.jpg
└── vid2.txt

Training

FLUX.1-based Training Setup

GRID utilizes FLUX.1 architecture for training. You'll need:

GPU with minimum 24GB VRAM
FLUX.1-dev model access and license

Setup Steps:

Accept the model license at black-forest-labs/FLUX.1-dev, then follow the official setup guide in black-forest-labs/flux repository for deployment and model weights download.

Training Configuration

Copy example config:

cp config/train_lora_4d.yaml config/your_config.yaml

Edit configuration parameters
Start training:

python run.py config/your_config.yaml

Training can be interrupted safely (except during checkpoint saving) and will resume from the last checkpoint.

Inference

Applications

Text-to-Video Generation
Image-to-Video Synthesis
Multi-view Image Generation
Video Style Transfer
Editing

Benchmarks

TODO

Release the paper
Release the training codes and demo
Update the project page
Release the model weights

Change Log

2024-12-17

Release the paper

Citation

Contact

wancong@stu.xjtu.edu.cn

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 535 Commits
.vscode		.vscode
assets		assets
config		config
docker		docker
extensions/example		extensions/example
extensions_built_in		extensions_built_in
jobs		jobs
models		models
repositories		repositories
source/vid1		source/vid1
tool		tool
toolkit		toolkit
.gitignore		.gitignore
.gitmodules		.gitmodules
FAQ.md		FAQ.md
LICENSE		LICENSE
README.md		README.md
build_and_push_docker.yaml		build_and_push_docker.yaml
flux_train_ui.py		flux_train_ui.py
info.py		info.py
requirements.txt		requirements.txt
run.py		run.py
run_modal.py		run_modal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GridShow: Omni Visual Generation

Overview

Key Features

Framework

Results

Installation

Requirements

Data Preparation Steps:

1. Initial Directory Structure:

2. Grid Layout Generation:

3. Caption Generation:

Training

FLUX.1-based Training Setup

Setup Steps:

Training Configuration

Inference

Applications

Benchmarks

TODO

Change Log

2024-12-17

Citation

Contact

License

About

Releases

Packages

Languages

License

Should-AI-Lab/GRID

Folders and files

Latest commit

History

Repository files navigation

GridShow: Omni Visual Generation

Overview

Key Features

Framework

Results

Installation

Requirements

Data Preparation Steps:

1. Initial Directory Structure:

2. Grid Layout Generation:

3. Caption Generation:

Training

FLUX.1-based Training Setup

Setup Steps:

Training Configuration

Inference

Applications

Benchmarks

TODO

Change Log

2024-12-17

Citation

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages