Papers and codes collection for customize, personalized and editable generative models in 2D and 3D domains.
Artificial Intelligence Generated Content (AIGC) has become ubiquitous, demonstrating the power of generating mesmerizing results of random portraits. However, users generally show greater interest in personalized information (for example, faces from familiar people or celebrities) in these generated results than in generic faces. This tendency toward customization in AIGC arouses attention to customized, personalized, and editable generative AI.
This repo mainly focuses on visual generative models (leaving out LLMs), including 2D image-to-image, 2D text-to-image, and text-guided 3D generation/manipulation, collecting customized, personalized, and editable works in these specific domains. For any addition about other 2D/3D AIGC domains or bugs report, please open an issue, pull requests, or e-mail me at normanzheng6606@gmail.com
for better communication.
Frequantly updating, please stay tuned!
-
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
CVPR2024
{Paper} {Code} {Webpage} -
Face2Diffusion for Fast and Editable Face Personalization
CVPR2024
{Paper} {Code} {Webpage} -
InstantID: Zero-shot Identity-Preserving Generation in Seconds
arxiv
{Paper} {Code} {Webpage}
-
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
CVPR2024
{Paper} {Code} {Webpage} -
MagiCapture: High-Resolution Multi-Concept Portrait Customization
AAAI2024
{Paper} {Code} {Webpage} -
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
CVPR2024
{Paper} {Code} -
Orthogonal Adaptation for Modular Customization of Diffusion Models
CVPR2024
{Paper} {Webpage} -
High-fidelity Person-centric Subject-to-Image Synthesis
CVPR 2024
{Paper} {Code} -
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
CVPR 2024
{Paper} {Code} -
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
{Paper} {Code} {Webpage} -
MC2: Multi-concept Guidance for Customized Multi-concept Generation
arxiv
{Paper} {Code} -
ToonCrafter: Generative Cartoon Interpolation
arxiv
{Paper} {Code} {Webpage} -
PCM : Phased Consistency Model
arxiv
{Paper} {Code} {Webpage}
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
arxiv
{Paper} {Code} {Webpage} -
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
arxiv
{Paper} {Code} {Webpage}
- When StyleGAN Meets Stable Diffusion: a đť’˛+ Adapter for Personalized Image Generation
arxiv
{Paper} {Code}
-
Real-World Image Variation by Aligning Diffusion Inversion Chain
NeurIPS2023
{Paper} {Code} {Webpage} -
MyStyle++: A Controllable Personalized Generative Prior
SIGGRAPH ASIA 2023
{Paper} {Code} {Webpage}Details
-
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
NeurIPS2023
{Paper} {Code} {Webpage} -
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing
arxiv
{Paper} {Code} {Webpage} -
Cones 2: Customizable Image Synthesis with Multiple Subjects
NeurIPS2023
{Paper} {Code} {Webpage} -
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
ACM MM 2023
{Paper} {Code}
-
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
CVPR 2024
{Paper} {Code} {Webpage} -
Customizing Text-to-Image Models with a Single Image Pair
arxiv
{Paper}
- ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
arxiv
{Paper} {Code} {Webpage}
-
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
arxiv
{Paper} {Webpage} -
Real-World Image Variation by Aligning Diffusion Inversion Chain
NeurIPS2023
{Paper} {Code} {Webpage} -
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
arxiv
{Paper} {Code} {Webpage}
-
MagiCapture: High-Resolution Multi-Concept Portrait Customization
AAAI2024
{Paper} {Code} {Webpage} -
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
CVPR 2024
{Paper} {Code} {Webpage} -
LocInv: Localization-aware Inversion for Text-Guided Image Editing
arxiv
{Paper} -
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
arxiv
{Paper} {Code} {Webpage} -
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
arxiv
{Paper} {Code} {Webpage} -
DRAGTEXT: Rethinking Text Embedding in Point-based Image Editing
arxiv
{Paper}
-
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
SIGGRAPH 2023
{Paper} {Code} {Webpage} -
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
CVPR 2024
{Paper} {Code} {Webpage} -
Expressive Text-to-Image Generation with Rich Text
ICCV 2023
{Paper} {Code} {Webpage}
-
FlashFace: Human Image Personalization with High-fidelity Identity Preservation
arxiv
{Paper} {Code} {Webpage} -
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization
arxiv
{Paper}Details
-
StableIdentity: Inserting Anybody into Anywhere at First Sight
arxiv
{Paper} {Code} {Webpage}
- Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
arxiv
{Paper}
- DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation
ICLR2024
{Paper} {Code} {Webpage}
- DreamMatcher: Appearance Matching Self-Attention
for Semantically-Consistent Text-to-Image Personalization
CVPR2024
{Paper} {Code} {Webpage}
-
Orthogonal Adaptation for Modular Customization of Diffusion Models
CVPR2024
{Paper} {Webpage} -
High-fidelity Person-centric Subject-to-Image Synthesis
CVPR 2024
{Paper} {Code} -
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
{Paper} {Code} {Webpage}
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
arxiv
{Paper} {Code} {Webpage} -
Real-World Image Variation by Aligning Diffusion Inversion Chain
NeurIPS2023
{Paper} {Code} {Webpage} -
Customization Assistant for Text-to-image Generation
arxiv
{Paper} {Code} -
FaceStudio: Put Your Face Everywhere in Seconds
arxiv
{Paper} {Code} {Webpage} -
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
arxiv
{Paper} {Code} {Webpage} -
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
arxiv
{Paper} {Code} -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
arxiv
{Paper} {Webpage} -
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
arxiv
{Paper} {Code} {Webpage} -
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
CVPR2022
{Paper} {Code} {Webpage} -
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
arxiv
{Paper} {Code} {Webpage} -
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
arxiv
{Paper} {Code} {Webpage} -
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
NeurIPS2023
{Paper} {Code} {Webpage} -
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
arxiv
{Paper} {Code} {Webpage} -
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing
arxiv
{Paper} {Code} {Webpage} -
Cones 2: Customizable Image Synthesis with Multiple Subjects
NeurIPS2023
{Paper} {Code} {Webpage} -
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
ICCV 2023
{Paper} {Code}
-
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
arxiv
{Paper} -
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
CVPR 2024
{Paper} {Code} {Webpage} -
CustomText: Customized Textual Image Generation using Diffusion Models
arxiv
{Paper} -
EmoEdit: Evoking Emotions through Image Manipulation
arxiv
{Paper} -
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
arxiv
{Paper} -
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
arxiv
{Paper} {Code} {Webpage}
- ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
arxiv
{Paper} {Code} {Webpage}
-
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models
arxiv
{Paper} -
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
arxiv
{Paper} {Code} {HuggingFace} -
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
NeurIPS 2023
{Paper} {Code}
- TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
arxiv
{Paper} {Webpage}
- StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
CVPR 2024
{Paper} {Code} {Webpage}
-
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
CVPR2024
{Paper} {Code} {Webpage} -
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
CVPR 2024
{Paper} {Code} {Webpage} -
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
CVPR 2024
{Paper} {Code} {Webpage} -
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
CVPR 2024
{Paper} {Code} {Webpage} -
Diffusion Time-step Curriculum for One Image to 3D Generation
CVPR 2024
{Paper} {Code} -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
ICLR 2024
{Paper} {Code} {Webpage} -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
arxiv
{Paper} {Code} {Webpage} -
EG4D: Explicit Generation of 4D Object without Score Distillation
arxiv
{Paper} {Code} -
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
arxiv
{Paper} {Code} {Webpage} -
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
arxiv
{Paper} {Code} {webpage} -
Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models
arxiv
{Paper} {Code} {Webpage}
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning
CVPR2023
{Paper} {Code}
-
TryOnDiffusion: A Tale of Two UNets
CVPR 2023
{Paper} {Code} {Webpage} -
Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation
NeurIPS 2023
{Paper} {Code} {Webpage}
-
GaussianEditor (S-Lab, NTU, etc.): Swift and Controllable 3D Editing with Gaussian Splatting
CVPR2024
{Paper} {Code} {Webpage} -
GenN2N: Generative NeRF2NeRF Translation for 3D Shape Manipulation
CVPR2024
{Paper} {Code} {Webpage} -
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
CVPR2024
{Paper} {Code} {Webpage} -
TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
arxiv
{Paper} {Webpage}
-
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
CVPR2024
{Paper} {Code} {Webpage} -
Posterior Distillation Sampling
CVPR 2024
{Paper} {Code} {Webpage} -
Learning Continuous 3D Words for Text-to-Image Generation
CVPR 2024
{Paper} {Code} {Webpage} -
Control4D: Efficient 4D Portrait Editing with Text
CVPR 2024
{Paper}{Webpage} -
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
arxiv
{Paper} {Code} {Webpage} -
Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models
arxiv
{Paper} {Code} {Webpage}
-
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
ICCV2023
{Paper} {Code} -
DreamBooth3D: Subject-Driven Text-to-3D Generation with Dream Fields
ICCV2023
{Paper} {Webpage} -
GaussianEditor (Huawei): Editing 3D Gaussians Delicately with Text Instructions
arxiv
{Paper} {Webpage} -
ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields
NeurIPS 2023
{Paper} {Code} {Webpage}
-
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
arxiv
{Paper} {Code} {Webpage} -
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
arxiv
{Paper} {Code} {Webpage} -
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
arxiv
{Paper} {Code} {Webpage} -
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
arxiv
{Paper} {Code} {Webpage} -
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
arxiv
{Paper} {Code} {Webpage} -
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
arxiv
{Paper} {Code} {Webpage} -
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
arxiv
{Paper} {Code} {Webpage} -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
arxiv
{Paper} {Code} {Webpage} -
Training-free Composite Scene Generation for Layout-to-Image Synthesis
arxiv
{Paper} {Code} -
MagicFight: Personalized Martial Arts Combat Video Generation
openreview
{Paper} -
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
ECCV 2024
{Paper} {Code} {Webpage}
-
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
arxiv
{Paper} {Code} {Webpage} -
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
ICML 2024 Oral
{Paper} {Code} {Webpage} -
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
arxiv
{Paper} {Code} {Webpage} -
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
arxiv
{Paper} {Code} {Webpage} -
Text-Animator: Controllable Visual Text Video Generation
arxiv
{Paper} {Code} {Webpage} -
MotionBooth: Motion-Aware Customized Text-to-Video Generation
arxiv
{Paper} {Code} {Webpage} -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
arxiv
{Paper} {Code} {Webpage} -
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
arxiv
{Paper} {Code} {Webpage} -
Multi-sentence Video Grounding for Long Video Generation
arxiv
{Paper} -
Video Editing via Factorized Diffusion Distillation
ECCV 2024
{Paper} {Webpage}
- Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
arxiv
{Paper} {Code} {Webpage}
-
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
Text-to-Video-Benchmark
{Paper} {Code} {Webpage} -
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
{Paper} {Code} {Webpage}
-
Flickr-Faces-HQ Dataset (FFHQ) {Paper} {Code} {Download}
-
CelebAMask-HQ {Paper} {Code} {Download}
-
Multi-Modal-CelebA-HQ
CVPR 2021
{Paper} {Code} {Download} -
Dress Code: High-Resolution Multi-Category Virtual Try-On.
ECCV 2022
{Paper} {Code} {Download}
-
SD-v1-4 {Paper} {Code} {HuggingFace} {Blog} {Download}
CompVis/stable-diffusion-v1-4Details
-
SD-1-5 {Paper} {Code} {HuggingFace} {Blog} {Download}
runwayml/stable-diffusion-v1-5Details
-
SD-2-1-base {Paper} {Code} {HuggingFace} {Download}
stabilityai/stable-diffusion-2-1-baseDetails
-
SD-XL: Improving Latent Diffusion Models for High-Resolution Image Synthesis {Paper} {Code} {HuggingFace} {Download}
stabilityai/stable-diffusion-xl-base-1.0Details
-
sdxl-turbo (Adversarial Diffusion Distillation) {Paper} {Code} {HuggingFace} {Download} {Demo}
stabilityai/sdxl-turboDetails