Skip to content

algebraicdianuj/DC_U

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code release for the paper "Dataset Condensation Driven Machine Unlearning".

Setup

Platform: Ubuntu 22+

git clone https://github.com/algebraicdianuj/DC_U.git && cd DC_U
conda create -n DCU python=3.8.19
conda activate DCU
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118
python -m pip install -U scikit-learn
conda install scikit-image
pip install adversarial-robustness-toolbox
conda install -c conda-forge opacus
pip install tensorflow-privacy
pip install timm

Experiments

Base Comparison between SOTA performance mentioned here

VGG-16 | CIFAR-10 (downsampled->100 images per class)

I had to downsample the dataset for the sake of NTK based scrubbing method computation on 16GB RAM, 8 GB VRAM.

cd sota_method_performance/vgg16_cifar10
chmod +x runner.sh
./runner.sh

CNN | CIFAR-10 (downsampled->100 images per class)

cd sota_method_performance/cnn_cifar10
chmod +x runner.sh
./runner.sh

Weak CNN | CIFAR-10 (downsampled->100 images per class)

cd sota_method_performance/weaker_cnn_cifar10
chmod +x runner.sh
./runner.sh

Random Forgetting (10 Percent) (Table-1)

MLP | CIFAR10

cd cifar10_exps/mlp
chmod +x runner.sh
./runner.sh

CNN | CIFAR10

cd cifar10_exps/cnn
chmod +x runner.sh
./runner.sh

RESNET18 | CIFAR10

cd cifar10_exps/resnet18
chmod +x runner.sh
./runner.sh

VGG16 | CIFAR10

cd cifar10_exps/vgg16
chmod +x runner.sh
./runner.sh

MLP | SVHN

cd svhn_exps/mlp
chmod +x runner.sh
./runner.sh

CNN | SVHN

cd svhn_exps/cnn
chmod +x runner.sh
./runner.sh

RESNET18 | SVHN

cd svhn_exps/resnet18
chmod +x runner.sh
./runner.sh

VGG16 | SVHN

cd svhn_exps/vgg16
chmod +x runner.sh
./runner.sh

Class-wise Forgetting

CIFAR-10

cd classforget_cifar10_exps
chmod +x runner.sh
./runner.sh
chmod +x runner_mlp_cnn_resnet_vgg.sh
./runner_mlp_cnn_resnet_vgg.sh

SVHN

cd classforget_svhn_exps
chmod +x runner.sh
./runner.sh
chmod +x runner_cnn_resnet_vgg.sh
./runner_cnn_resnet_vgg.sh

Effect of Size of Rememberance Sample Dataset (Images Per Class-IPC) over Performance of Unlearning

cd ipc_exp_cifar10

# K=1
cd ipc1
chmod +x runner.sh
./runner.sh

# K=10
cd ipc10
chmod +x runner.sh
./runner.sh

# K=50
cd ipc50
chmod +x runner.sh
./runner.sh

Effect of different K values over different sizes of Forget set (Dataset: CIFAR-10, Model: VGG16)

VGG16 Experiments

cd K_evaluation

# K = 45, Forgetting percentage (out of total training dataset)=1 percent
cd cifar10_vgg16_randomforget1perc_MIcondensation_K45
chmod +x runner.sh
./runner.sh

# K = 450, Forgetting percentage = 1 percent
cd cifar10_vgg16_randomforget1perc_MIcondensation_K45
chmod +x runner.sh
./runner.sh

# K = 45, Forgetting percentage = 10 percent
cd  cifar10_vgg16_randomforget1perc_MIcondensation_K45
chmod +x runner.sh
./runner.sh

# K=450, Forgetting percentage = 10 percent
cd cifar10_vgg16_randomforget10perc_MIcondensation_K450
chmod +x runner.sh
./runner.sh

# K= 450, Forgetting percentage = 50 percent
cd cifar10_vgg16_randomforget50perc_MIcondensation_K450
chmod +x runner.sh
./runner.sh

MLP Experiments

Using Fast Distribution Matching based Dataset Condensation (Proposed) as Base

cd K_evaluation_FDMcondensation/mlp
chmod +x runny.sh
./runny.sh

cd ..
# copy the csv files with containing name 'arbitrary_uniform' from mlp/result to random_case_plotting
# copy the csv files with containing name 'classwise' from mlp/result to classwise_plotting

Using Model Inversion based Dataset Condensation (Proposed) as Base

cd K_evaluation_MIcondensation/mlp
chmod +x runny.sh
./runny.sh

cd ..
# copy the csv files with containing name 'arbitrary_uniform' from mlp/result to random_case_plotting
# copy the csv files with containing name 'classwise' from mlp/result to classwise_plotting

Why One Epoch is Sufficient for Intermediate Training

cd whysingleepoch_intermediate_cifar10_cnn
python pre_procedure.py
python post_preprocedure.py
python overture_to_proposed.py
jupyter nbconvert --to notebook --execute layer_wise_gradient.ipynb --output layer_wise_gradient.ipynb

Progression of Unlearning over Multiple Rounds

cd svhn_cnn_UnlearningCycles
chmod +x runner.sh
./runner.sh
cd ..
cd plotting/unlearning cycles
jupyter nbconvert --to notebook --execute radar_plt.ipynb --output radar_plt.ipynb

Unlearning over Condensed Model

cd cifar10_vgg16_Condensed_retraining
chmod +x runner.sh
./runner.sh
cd ..
mv cifar10_vgg16_Condensed_retraining/result/modular_unlearning.csv plotting/Unlearning_in_Condensation
mv cifar10_vgg16_Condensed_retraining/result/recondensation_training.csv plotting/Unlearning_in_Condensation
mv cifar10_vgg16_Condensed_retraining/result/retraining.csv plotting/Unlearning_in_Condensation
mv cifar10_vgg16_Condensed_retraining/result/fisher_forgetting_stats.csv plotting/Unlearning_in_Condensation
mv cifar10_vgg16_Condensed_retraining/result/ntk_scrubbing_stats.csv plotting/Unlearning_in_Condensation
cd plotting/Unlearning_in_Condensation
jupyter nbconvert --to notebook --execute plotter.ipynb --output plotter.ipynb

Unlearning as Alternative to Differential Privacy

The repository: https://github.com/awslabs/fast-differential-privacy, was used as the differentially private optimizer for training of model.

cd cifar10_vgg16_DPCompetitor_v2
chmod +x runner.sh
./runner.sh
cd ..
cd plotting/DP_competitor
jupyter nbconvert --to notebook --execute radar_plt.ipynb --output radar_plt.ipynb
jupyter nbconvert --to notebook --execute scatter.ipynb --output scatter.ipynb

Unlearning and Overfitting Metrics

cd cifar10_vit_tiny_patch16_224_UnlearningAndOverfitting_v2
chmod +x runner.sh
./runner.sh
chmod +x exp.sh
./exp.sh
cd ..
# mov the csv files from cifar10_vit_tiny_patch16_224_UnlearningAndOverfitting_v2/result to plotting/unlearning_metric_and_overfitting
jupyter nbconvert --to notebook --execute plotty.ipynb --output plotty.ipynb

Dataset Condensation Benchmarking (SOTA mentioned here)

cd ds_condensation_benchmarking
# DS condensation via Distribution Matching
python distribution_matching.py

# DS condensation via Gradient Matching
python gradient_matching.py

# DS condensation via Fast Distribution Matching
python fast_dataset_condensation_proposed.py

# DS condensation via Model Inversion
python mi_dataset_condensation_proposed.py

# DS distillation via minimax-diffusion
git clone https://github.com/algebraicdianuj/MinimaxDiffusion.git
cd MinimaxDiffusion
chmod +x initiator.sh
chmod +x commands.sh
./initiator.sh
./commands.sh

# DS condensation via improved distribution matching
python idm_my_version.py

# DS condensation via matching training trajectories
python tdc_my_version.py

# DS condensation via efficient synthetic data parameterization (without augmentation)
python edc_my_version.py

# DS condensation via efficient synthetic data parameterization (with augmentation)
python edc_my_version.py --do_aug yes


cd..
# copy csv files from ds_condensation_benchmarking to plotting/dataset_condensation
jupyter nbconvert --to notebook --execute plotter.ipynb --output plotter.ipynb

SOTA Unlearning Implementation References

SOTA Dataset Condensation References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published