Skip to content

The official implementation of 'GRID: Visual Layout Generation.'

License

Notifications You must be signed in to change notification settings

Should-AI-Lab/GRID

 
 

Repository files navigation

GridShow: Omni Visual Generation

The official implementation of work "GridShow: Omni Visual Generation".

arXiv License

Overview

GRID introduces a novel paradigm that reframes visual generation tasks as grid layout problems. Built upon FLUX.1 architecture, our framework transforms temporal sequences into grid layouts, enabling image generation models to process visual sequences holistically. This approach achieves remarkable efficiency and versatility across diverse visual generation tasks.

image

Key Features

  • Efficient Inference: up to 35× faster inference speeds compared to specialized models
  • Resource Efficient: Requires <1/1000 of computational resources
  • Versatile Applications: Supports Text-to-Video, Image-to-Video, Multi-view Generation, and more
  • Preserved Capabilities: Maintains strong image generation performance while expanding functionality

Framework

image

Results

GIF

Due to upload limits of github, we compress our size from 1024×1024 to 256×256, to see full size of each please refer to:

vid1 vid2 vid3 vid4 vid5 vid6 vid7

result

From left to right: input cat video, and edited results of fox, tiger, and red panda transformations.

Installation

Requirements

  • Python >= 3.10
  • NVIDIA GPU with 24GB+ VRAM
  • CUDA 11.6+
  • PyTorch >= 1.12
git clone https://github.com/[username]/GRID.git
cd GRID
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Data Preparation Steps:

1. Initial Directory Structure:

source/
├── train/
│   ├── sequence1/
│   │   └── frame_{1..n}.jpg  # Sequential frames 
│   └── sequence2/
│       └── frame_{1..n}.jpg
└── val/
    └── ...

2. Grid Layout Generation:

python tools/concat.py \\
    --input_dir source/train \\
    --output_dir vidgrid \\
    --grid_rows 4 \\
    --grid_cols 6 \\
    --frames_per_grid 24

Data Structure:

vidgrid/
├── vid1.jpg  # 4x6 grid containing 24 frames
└── vid2.jpg  # Each .jpg is a complete sequence

3. Caption Generation:

mkdir -p models

# Download GLM-4V-9B weights
# Option 1: From ModelScope
wget https://modelscope.cn/models/ZhipuAI/glm-4v-9b/resolve/main/pytorch_model.bin -O models/glm-4v-9b.bin

# Option 2: From Hugging Face
wget https://huggingface.co/THUDM/glm-4v-9b/resolve/main/pytorch_model.bin -O models/glm-4v-9b.bin

# Option 3: From WiseModel
wget https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B/resolve/main/pytorch_model.bin -O models/glm-4v-9b.bin

python tools/caption_glm.py

Final Training Data Structure:

vidgrid/
├── vid1.jpg  # Grid image
├── vid1.txt  # Corresponding caption
├── vid2.jpg
└── vid2.txt

Training

FLUX.1-based Training Setup

GRID utilizes FLUX.1 architecture for training. You'll need:

  • GPU with minimum 24GB VRAM
  • FLUX.1-dev model access and license

Setup Steps:

Accept the model license at black-forest-labs/FLUX.1-dev, then follow the official setup guide in black-forest-labs/flux repository for deployment and model weights download.

Training Configuration

  1. Copy example config:

cp config/train_lora_4d.yaml config/your_config.yaml

  1. Edit configuration parameters
  2. Start training:

python run.py config/your_config.yaml

Training can be interrupted safely (except during checkpoint saving) and will resume from the last checkpoint.

Inference

Applications

  • Text-to-Video Generation
  • Image-to-Video Synthesis
  • Multi-view Image Generation
  • Video Style Transfer
  • Editing

Benchmarks

image

image

TODO

  • Release the paper
  • Release the training codes and demo
  • Update the project page
  • Release the model weights

Change Log

2024-12-17

  • Release the paper

Citation

Contact

wancong@stu.xjtu.edu.cn

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

About

The official implementation of 'GRID: Visual Layout Generation.'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%