This is the repository of our work appeared in ICML 2024, MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective
.
If you use the code from this repo, please cite our work. Thanks!
@inproceedings{
chen2024mass,
title={Ma{SS}: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective},
author={Yizhuo Chen and Chun-Fu Chen and Hsiang Hsu and Shaohan Hu and Marco Pistoia and Tarek F. Abdelzaher},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=61RlaY9EIn}
}
conda create -y -n mass python=3.6
conda activate mass
pip install -r requirements.txt
Download the dataset from the source website:
git clone https://github.com/soerenab/AudioMNIST.git
and then, run tools/audiomnist_preprocess.py
for preprocess.
python3 tools/audiomnist_preprocess.py --audio_path AudioMNIST/
The created features are stored in ./data/audiomnist/audiomnist_train_fp16.pkl
and ./data/audiomnist/audiomnist_val_fp16.pkl
Download the dataset from the source website:
git clone https://github.com/mmalekzadeh/motion-sense.git
and then, unzip the file inside the repo data/A_DeviceMotion_data.zip
, and run the tools/motionsense_preprocess.py
to preprocess data.
cd ./motion-sense/data/
unzip A_DeviceMotion_data.zip
cd ../../
python3 tools/motionsense_preprocess.py --data_path ./motion-sense/ --output_path ./data/motionsense/
The process to run MaSS is similar to each dataset, it involves a couple of steps:
- Train the attribute classifiers for each attribute (U and S in the paper)
- Train an unannotated attribute model with InfoNCE (F in the paper)
- Train the MaSS with all above pretrained models
- Train another set of attribute classifiers with different settings for the utilities, e.g. random seed, for the use of finetuning the utility models over transformed data (we embed this step into MaSS training).
We have the training scripts for above processing in the scripts
folder for both audiomnist
and motionsense
datasets with proper comments. The scripts contain training commands for the settings of Table 2-4 in the paper.
For example, to run AudioMNIST experiments:
bash scripts/audiomnist.sh
and the results are stored in icml_results
folder.
As described in the paper, the users can control the mutual information over sensitive and utility via loss-m
and loss-n
setting, if your setting violate the constraints,
MaSS will stop the program if the settings violate the constraints.
For the accuracy of each attribute classifier, you can get the performance at the end of log.txt
file in the $OUTPUT_DIR
folder; while for MaSS, the performance of sensitive attribute(s) and preserved attribute(s) are recorded in mass_log.txt
in the output folder.
If you have any questions, please feel free to contact us through email (richard.cf.chen@jpmchase.com).