Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment.
Jiawei Du, Xin Zhang, Juncheng Hu, Wenxin Huang, Joey Tianyi Zhou
A*Star, XiDian University, National University of Singapore, Hubei University
The sharp increase in data-related expenses has motivated research into condensing datasets while retaining the most informative features. Dataset distillation has thus recently come to the fore. This paradigm generates synthetic datasets that are representative enough to replace the original dataset in training a neural network. To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage. In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets. We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach. Specifically, we introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process, thereby maximizing the representativeness and diversity of each synthetic instance. Our method ensures that each batch of synthetic data mirrors the characteristics of a large, varying subset of the original dataset. Extensive experiments across multiple datasets, including CIFAR, Tiny-ImageNet, and ImageNet-1K, demonstrate the superior performance of our method, highlighting its effectiveness in producing diverse and representative synthetic datasets with minimal computational expense.
To get started, follow these instructions to set up the environment and install dependencies.
-
Clone this repository:
git clone https://github.com/AngusDujw/Diversity-Driven-Synthesis.git cd Diversity-Driven-Synthesis
-
Install required packages: You don’t need to create a new environment; simply ensure that you have compatible versions of CUDA and PyTorch installed.
Here’s how to use this code for distillation and evaluation:
-
Preparation For ImageNet-1K, we utilize the pre-trained weights available in torchvision. As for CIFAR datasets, we offer the trained weights on this link. Alternatively, you can generate the pre-trained weights yourself using the following code.
bash squeeze.sh
-
Distillation: Before performing distillation, please first prepare the images by randomly sampling from the original dataset and saving them as tensors. We provide the tensor-formatted initialization images at this [link].(https://drive.google.com/drive/folders/1ueAnTXOUGiQ_E9iIssNYmEBX4vlVQEDZ?usp=sharing)
Cifar:
python distillation/distillation_cifar.py --iteration 1000 --r-bn 0.01 --batch-size 100 --lr 0.25 --exp-name distillation-c100-ipc50 --store-best-images --syn-data-path ./syn_data/ --init_path ./distillation/init_images/cifar100 --steps 12 --rho 15e-3 --ipc-start 0 --ipc-end 50 --r-var 11 --dataset cifar100
ImageNet-1K:
python distillation/distillation_imgnet.py --exp-name distillation-imgnet-ipc50 --syn-data-path ./syn_data/ --init-path ./distillation/init_images/imgnet/ --arch-name resnet18 --batch-size 100 --lr 0.25 --iteration 2000 --r-bn 0.01 --r-var 2 --steps 15 --rho 15e-3 --store-best-images --ipc-start 0 --ipc-end 50
-
Evaluation:
Cifar:
python validation/validation_cifar.py --epochs 400 --batch-size 128 --ipc 10 --syn-data-path ./syn_data/distillation-c100-ipc50 --output-dir ./syn_data/validation-c100-ipc50 --networks resnet18 --dataset cifar100
ImageNet-1K:
python validation/validation_imgnet.py --epochs 300 --batch-size 128 --ipc 50 --mix-type cutmix --cos -T 20 -j 4 --train-dir ./syn_data/distillation-imgnet-ipc50 --output-dir ./syn_data/validation-imgnet-ipc50 --val-dir ./data/Imagenet-1k/val --teacher-model resnet18 --model resnet18
we also provide the .sh
script in the scripts
directory.
Our experiments demonstrate the effectiveness of the proposed approach across various benchmarks.
For detailed experimental results and further analysis, please refer to the full paper.
If you find this code useful in your research, please consider citing our work:
@inproceedings{dwa2024neurips,
title={Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment},
author={Du, Jiawei and Zhang, Xin and Hu, Juncheng and Huang, Wenxin and Zhou, Joey Tianyi},
booktitle={Adv. Neural Inf. Process. Syst. (NeurIPS)},
year={2024}
}
Our code has referred to previous work:
Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective