Skip to content

Codes for the NeurIPS 2024 Spotlight Paper "Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment"

Notifications You must be signed in to change notification settings

AngusDujw/Diversity-Driven-Synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟 Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment

🔥 NeurIPS 2024 Spotlight 🔥

Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment.
Jiawei Du, Xin Zhang, Juncheng Hu, Wenxin Huang, Joey Tianyi Zhou
A*Star, XiDian University, National University of Singapore, Hubei University

📖 Introduction

The sharp increase in data-related expenses has motivated research into condensing datasets while retaining the most informative features. Dataset distillation has thus recently come to the fore. This paradigm generates synthetic datasets that are representative enough to replace the original dataset in training a neural network. To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage. In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets. We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach. Specifically, we introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process, thereby maximizing the representativeness and diversity of each synthetic instance. Our method ensures that each batch of synthetic data mirrors the characteristics of a large, varying subset of the original dataset. Extensive experiments across multiple datasets, including CIFAR, Tiny-ImageNet, and ImageNet-1K, demonstrate the superior performance of our method, highlighting its effectiveness in producing diverse and representative synthetic datasets with minimal computational expense.


⚙️ Installation

To get started, follow these instructions to set up the environment and install dependencies.

  1. Clone this repository:

    git clone https://github.com/AngusDujw/Diversity-Driven-Synthesis.git
    cd Diversity-Driven-Synthesis
  2. Install required packages: You don’t need to create a new environment; simply ensure that you have compatible versions of CUDA and PyTorch installed.


🚀 Usage

Here’s how to use this code for distillation and evaluation:

  • Preparation For ImageNet-1K, we utilize the pre-trained weights available in torchvision. As for CIFAR datasets, we offer the trained weights on this link. Alternatively, you can generate the pre-trained weights yourself using the following code.

    bash squeeze.sh
  • Distillation: Before performing distillation, please first prepare the images by randomly sampling from the original dataset and saving them as tensors. We provide the tensor-formatted initialization images at this [link].(https://drive.google.com/drive/folders/1ueAnTXOUGiQ_E9iIssNYmEBX4vlVQEDZ?usp=sharing)

    Cifar:

    python distillation/distillation_cifar.py 
        --iteration 1000 --r-bn 0.01 --batch-size 100 --lr 0.25 
        --exp-name distillation-c100-ipc50 
        --store-best-images 
        --syn-data-path ./syn_data/ 
        --init_path ./distillation/init_images/cifar100 
        --steps 12 --rho 15e-3 --ipc-start 0 --ipc-end 50 --r-var 11 
        --dataset cifar100 

    ImageNet-1K:

    python distillation/distillation_imgnet.py 
        --exp-name distillation-imgnet-ipc50  
        --syn-data-path ./syn_data/ 
        --init-path ./distillation/init_images/imgnet/ 
        --arch-name resnet18 
        --batch-size 100 --lr 0.25 --iteration 2000 --r-bn 0.01 
        --r-var 2 --steps 15 --rho 15e-3 
        --store-best-images 
        --ipc-start 0 --ipc-end 50 
  • Evaluation:

    Cifar:

    python validation/validation_cifar.py 
          --epochs 400 --batch-size 128 --ipc 10 
          --syn-data-path ./syn_data/distillation-c100-ipc50 
          --output-dir ./syn_data/validation-c100-ipc50 
          --networks resnet18 --dataset cifar100 

    ImageNet-1K:

    python validation/validation_imgnet.py 
        --epochs 300 --batch-size 128 --ipc 50 
        --mix-type cutmix 
        --cos -T 20 -j 4 
        --train-dir ./syn_data/distillation-imgnet-ipc50 
        --output-dir ./syn_data/validation-imgnet-ipc50 
        --val-dir ./data/Imagenet-1k/val 
        --teacher-model resnet18 
        --model resnet18 

we also provide the .sh script in the scripts directory.


📊 Results

Our experiments demonstrate the effectiveness of the proposed approach across various benchmarks.

Results

For detailed experimental results and further analysis, please refer to the full paper.


📑 Citation

If you find this code useful in your research, please consider citing our work:

@inproceedings{dwa2024neurips,
    title={Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment},
    author={Du, Jiawei and Zhang, Xin and Hu, Juncheng and Huang, Wenxin and Zhou, Joey Tianyi},
    booktitle={Adv. Neural Inf. Process. Syst. (NeurIPS)},
    year={2024}
}

🎉 Reference

Our code has referred to previous work:

Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective

About

Codes for the NeurIPS 2024 Spotlight Paper "Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published