Skip to content
/ D4M Public

A pytorch implementation of CVPR24 paper "D4M: Dataset Distillation via Disentangled Diffusion Model"

License

Notifications You must be signed in to change notification settings

suduo94/D4M

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💾 D4M: Dataset Distillation via Disentangled Diffusion Model

💥 Stellar Features

🎯 Distilling Dataset in an Optimization-Free manner.
🎯 The distillation process is Architecture-Free. (Getting over the Cross-Architecture problem.)
🎯 Distilling large-scale datasets (ImageNet-1K) efficiently.
🎯 The distilled datasets are high-quality and versatile.

📚 Introduction

Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. We advocate for designing an economical dataset distillation framework that is independent of the matching architectures. With empirical observations, we argue that constraining the consistency of the real and synthetic image spaces will enhance the cross-architecture generalization. Motivated by this, we introduce Dataset Distillation via Disentangled Diffusion Model (D4M), an efficient framework for dataset distillation. Compared to architecture-dependent methods, D4M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes. The distilled datasets are versatile, eliminating the need for repeated generation of distinct datasets for various architectures. Through comprehensive experiments, D4M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.

method

Overview of D4M. For more details, please see our paper.

🔧 Quick Start

Create environment

  • Python >=3.9
  • Pytorch >= 1.12.1
  • Torchvision >= 0.13.1

Install Diffusers Library

You can install or upgrade the latest version of Diffusers library according to this page.

Modify Diffusers Library

Step 1: Copy the pipeline scripts (generate latents pipeline and synthesis images pipeline) into the path of Diffusers Library: diffusers/src/diffusers/pipelines/stable_diffusion.

Step 2: Modify Diffusers source code according to scripts/README.md.

Generate Prototypes

cd distillation
sh gen_prototype_imgnt.sh

Synthesis Images

cd distillation
sh gen_syn_image_imgnt.sh

Actually, if you don't need the JSON files (prototype) for exploration, you could combine the generate and synthesis processes into one, skipping the I/O steps.

Training-Time Matching (TTM)

cd matching
sh matching.sh

Validate

cd validate
sh train_FKD.sh

✨ Qualitative results

Compare to others

ImageNet-1K Results (Top: D4M, Bottom: SRe2L)
imagenet-1k results
Tiny-ImageNet Results (Top: D4M, Bottom: SRe2L)
tiny-imagnet results
CIFAR-10 Results (Top: D4M, Bottom: MTT)
cifar-10 results
CIFAR-100 Results (Top: D4M, Bottom: MTT)
cifar-100 results

Semantic Information

Distilled data within one class (Top: D4M, Bottom: SRe2L)
semantic richness results

For more qualitative results, please see the supplementary in our paper.

📊 Quantitative results

Results on Large-Scale datasets
semantic richness results

👍🏻 Acknowledgments

Our code is developed based on the following codebases, thanks for sharing!

📖 Citation

If you find this work helpful, please cite:

@InProceedings{Su_2024_CVPR,
    author    = {Su, Duo and Hou, Junjie and Gao, Weizhi and Tian, Yingjie and Tang, Bowen},
    title     = {D{\textasciicircum}4M: Dataset Distillation via Disentangled Diffusion Model},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {5809-5818}
}

About

A pytorch implementation of CVPR24 paper "D4M: Dataset Distillation via Disentangled Diffusion Model"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published