forked from inspire-group/DP-RandP
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Xinyu Tang
committed
Jun 12, 2023
1 parent
7fc9ad4
commit de2444d
Showing
34 changed files
with
6,446 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,68 @@ | ||
# DP-RandP | ||
# DP-RandP | ||
|
||
> Differentially Private Image Classification by Learning Priors from Random Processes [[arxiv](https://arxiv.org/abs/2306.06076)] | ||
> | ||
> Xinyu Tang*, Ashwinee Panda*, Vikash Sehwag, Prateek Mittal (*: equal contribution) | ||
> | ||
> | ||
|
||
## Requirements | ||
This version of code has been tested with Python 3.9.16 and PyTorch 1.12.1. | ||
|
||
Set up environment via pip and anaconda | ||
``` | ||
conda create -n "dprandp" python=3.9 | ||
conda activate dprandp | ||
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html | ||
pip install -r ./requirements.txt | ||
``` | ||
|
||
Note that we made small edits in [opacus/data_loader.py](https://github.com/pytorch/opacus/blob/3a7e8f82a8d02cc1ed227f2ef287865d904eff8d/opacus/data_loader.py#L198) to make sure the expected batch size of poisson sampling is the same as the hyperparameters in the code instead of approximated by 1 / len(data_loader) to be consistant with privacy accounting. | ||
``` | ||
sample_rate=data_loader.batch_size/len(data_loader.dataset), #instead of 1 / len(data_loader) | ||
``` | ||
|
||
We also provide the corresponding data_loader.py we used in `./opacus_utils` for reference. | ||
``` | ||
cp ./opacus_utils/data_loader.py $YOUR_opacus_lib_path/data_loder.py | ||
``` | ||
|
||
We also make change accordingly in ./tan/src/opacus_augmented/privacy_engine_augmented.py (line 386-387). | ||
|
||
Please make change accordingly for `--max_physical_batch_size` to fit your GPU memory. | ||
|
||
## Experiment | ||
For Phase I, please refer to folder `./learning_with_noise`. | ||
|
||
For Phase II and Phase III, please refer to folder `./tan`. | ||
|
||
For Table 4 in paper for DP-RandP w/o Phase III, please refer to foloder `./linear_prob`. | ||
|
||
## Citation | ||
``` | ||
@article{tang2023dprandp, | ||
title={Differentially Private Image Classification by Learning Priors from Random Processes}, | ||
author={Xinyu Tang and Ashwinee Panda and Vikash Sehwag and Prateek Mittal}, | ||
journal={arXiv preprint arXiv:2306.06076}, | ||
year={2023} | ||
} | ||
``` | ||
|
||
## Credits | ||
This code has been built upon the code accompanying the papers | ||
|
||
"[Learning to See by Looking at Noise](https://arxiv.org/abs/2106.05963)" [[code](https://github.com/mbaradad/learning_with_noise)]. | ||
|
||
"[TAN Without a Burn: Scaling Laws of DP-SGD](https://arxiv.org/abs/2210.03403)" [[code](https://github.com/facebookresearch/tan)]. | ||
|
||
The hyperparameter setting of the code mostly follows the papers | ||
|
||
"[Unlocking High-Accuracy Differentially Private Image Classification through Scale](https://arxiv.org/abs/2204.13650)" [[code](https://github.com/deepmind/jax_privacy)]. | ||
|
||
"[A New Linear Scaling Rule for Differentially Private Hyperparameter Optimization](https://arxiv.org/abs/2212.04486)" [[code](https://github.com/kiddyboots216/dp-custom)]. | ||
|
||
|
||
|
||
## Questions | ||
If anything is unclear, please open an issue or contact Xinyu Tang (xinyut@princeton.edu). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
This repo is built based on [source repo](https://github.com/mbaradad/learning_with_noise) that contains the code for paper: | ||
|
||
> [**Learning to See by Looking at Noise**] (NeurIPS 2021, Spotlight) | ||
> | ||
> [[Project page](https://mbaradad.github.io/learning_with_noise/)] [[Paper](https://arxiv.org/pdf/2106.05963.pdf)] [[arXiv](https://arxiv.org/abs/2106.05963)] | ||
> | ||
> Manel Baradad and Jonas Wulff and Tongzhou Wang and Phillip Isola and Antonio Torralba. | ||
For more details, please check the source repo. | ||
|
||
## Prepare Synthetic Data | ||
We mainly use the `stylegan-oriented' images (other datases are also available) provided by the source repo. You can also refer to [Data Generation](https://github.com/mbaradad/learning_with_noise#data-generation) in source repo to generate synthetic images. | ||
|
||
``` | ||
DATAPATH="/data/xinyut/randp/data" | ||
wget -O $DATAPATH/stylegan-oriented.zip http://data.csail.mit.edu/noiselearning/zipped_data/small_scale/stylegan-oriented.zip | ||
unzip $DATAPATH/stylegan-oriented.zip -d $DATAPATH/stylegan-oriented | ||
``` | ||
|
||
## Training the model | ||
|
||
Then you can launch the contrastive training in our Phase I with: | ||
``` | ||
DATAPATH="/data/xinyut/randp/data" | ||
ckpt_path="/data/xinyut/randp/ckpt" | ||
python align_uniform/main.py --imagefolder $DATAPATH/stylegan-oriented --result $ckpt_path/stylegan-oriented --gpus $GPU_ID0 $GPU_ID1 | ||
``` | ||
Note that the contrastive training requires GPU_ID0 and GPU_ID1 each provides at least 4GB of memory for WRN16-4. You can use the same GPU for $GPU_ID0 and $GPU_ID1 if a single GPU has enough memotry. | ||
|
||
## Citation | ||
``` | ||
@inproceedings{baradad2021learning, | ||
title={Learning to See by Looking at Noise}, | ||
author={Manel Baradad and Jonas Wulff and Tongzhou Wang and Phillip Isola and Antonio Torralba}, | ||
booktitle={Advances in Neural Information Processing Systems}, | ||
year={2021} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
import torch | ||
|
||
|
||
def align_loss(x, y, alpha=2): | ||
return (x - y).norm(p=2, dim=1).pow(alpha).mean() | ||
|
||
|
||
def uniform_loss(x, t=2): | ||
return torch.pdist(x, p=2).pow(2).mul(-t).exp().mean().log() | ||
|
||
|
||
__all__ = ['align_loss', 'uniform_loss'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
import os | ||
import time | ||
import argparse | ||
|
||
import torchvision | ||
import torch | ||
import torch.nn as nn | ||
import sys | ||
sys.path.append("./") | ||
from align_uniform.util import AverageMeter | ||
from align_uniform.wrn import WideResNet | ||
from align_uniform import align_loss, uniform_loss | ||
|
||
import logging | ||
|
||
def parse_option(): | ||
parser = argparse.ArgumentParser('STL-10 Representation Learning with Alignment and Uniformity Losses') | ||
|
||
parser.add_argument('--temperature', type=float, default=0.5, help='Temperature') | ||
parser.add_argument('--align_w', type=float, default=1, help='Alignment loss weight') | ||
parser.add_argument('--unif_w', type=float, default=1, help='Uniformity loss weight') | ||
parser.add_argument('--align_alpha', type=float, default=2, help='alpha in alignment loss') | ||
parser.add_argument('--unif_t', type=float, default=2, help='t in uniformity loss') | ||
|
||
parser.add_argument('--batch_size', type=int, default=256, help='Batch size') | ||
parser.add_argument('--epochs', type=int, default=200, help='Number of training epochs') | ||
parser.add_argument('--lr', type=float, default=None, | ||
help='Learning rate. Default is linear scaling 0.12 per 256 batch size') | ||
parser.add_argument('--lr_decay_rate', type=float, default=0.1, help='Learning rate decay rate') | ||
parser.add_argument('--lr_decay_epochs', default=[155, 170, 185], nargs='*', type=int, | ||
help='When to decay learning rate') | ||
parser.add_argument('--momentum', type=float, default=0.9, help='SGD momentum') | ||
parser.add_argument('--weight_decay', type=float, default=1e-4, help='L2 weight decay') | ||
parser.add_argument('--feat_dim', type=int, default=128, help='Feature dimensionality') | ||
|
||
parser.add_argument('--width', type=int, default=4, help="width factor for WideResNet") | ||
parser.add_argument('--depth', type=int, default=16, help="depth factor for WideResNet") | ||
|
||
parser.add_argument('--num_workers', type=int, default=2, help='Number of data loader workers to use') | ||
parser.add_argument('--log_interval', type=int, default=1, help='Number of iterations between logs') | ||
parser.add_argument('--gpus', default=[0], nargs='*', type=int, | ||
help='List of GPU indices to use, e.g., --gpus 0 1 2 3') | ||
|
||
parser.add_argument('--result_folder', type=str, default=None, help='Base directory to save model') | ||
parser.add_argument('--arch', type=str, default='wrn', help='Base directory to save model') | ||
|
||
parser.add_argument('--imagefolder', type=str, default=None, help='Path to imagefolder. If not used, use STL-10.') | ||
parser.add_argument('--resize_image', action='store_true', help='Resize image to 96x96 (STL10 size) first') | ||
parser.add_argument('--optimizer', type=str, default='sgd', help='Which optimizer to use (SGD or Adam)') | ||
|
||
opt = parser.parse_args() | ||
|
||
if opt.lr is None: | ||
opt.lr = 0.12 * (opt.batch_size / 256) | ||
|
||
opt.gpus = list(map(lambda x: torch.device('cuda', x), opt.gpus)) | ||
|
||
opt.save_folder = os.path.join(opt.result_folder, f"{opt.arch}{opt.depth}{opt.width}") | ||
os.makedirs(opt.save_folder, exist_ok=True) | ||
|
||
return opt | ||
|
||
|
||
class TwoCropsTransform: | ||
"""Take two random crops of one image as the query and key.""" | ||
|
||
def __init__(self, base_transform): | ||
self.base_transform = base_transform | ||
|
||
def __call__(self, x): | ||
q = self.base_transform(x) | ||
k = self.base_transform(x) | ||
return [q, k] | ||
|
||
|
||
|
||
def get_data_loader(opt): | ||
transform_array = [] | ||
if opt.resize_image: | ||
transform_array.append( | ||
torchvision.transforms.Resize((32,32)) | ||
) | ||
|
||
transform_array += [ | ||
torchvision.transforms.RandomResizedCrop(32, scale=(0.08, 1)), | ||
torchvision.transforms.RandomHorizontalFlip(), | ||
torchvision.transforms.ColorJitter(0.4, 0.4, 0.4, 0.4), | ||
torchvision.transforms.RandomGrayscale(p=0.2), | ||
torchvision.transforms.ToTensor(), | ||
torchvision.transforms.Normalize( | ||
(0.44087801806139126, 0.42790631331699347, 0.3867879370752931), | ||
(0.26826768628079806, 0.2610450402318512, 0.26866836876860795), | ||
), | ||
] | ||
|
||
transform = torchvision.transforms.Compose(transform_array) | ||
|
||
transform = TwoCropsTransform(transform) | ||
|
||
train_path = os.path.join(opt.imagefolder, 'train') | ||
print(f'Loading data from {opt.imagefolder} as imagefolder') | ||
dataset = torchvision.datasets.ImageFolder( | ||
train_path, | ||
transform=transform) | ||
small_scale_samples = 105000 | ||
assert len(dataset) == small_scale_samples, "Small scale experiment should have 105000 samples, and found {}".format(len(dataset)) | ||
loader = torch.utils.data.DataLoader(dataset, batch_size=opt.batch_size, num_workers=opt.num_workers, | ||
shuffle=True, pin_memory=True) | ||
|
||
return loader | ||
|
||
|
||
def main(): | ||
opt = parse_option() | ||
|
||
torch.cuda.set_device(opt.gpus[0]) | ||
torch.backends.cudnn.deterministic = True | ||
torch.backends.cudnn.benchmark = True | ||
|
||
encoder = nn.DataParallel(WideResNet(opt.depth, opt.feat_dim, opt.width, 16, 0, 0, 0).to(opt.gpus[0]), opt.gpus) | ||
|
||
if opt.optimizer == 'sgd': | ||
optim = torch.optim.SGD(encoder.parameters(), lr=opt.lr, | ||
momentum=opt.momentum, weight_decay=opt.weight_decay) | ||
elif opt.optimizer == 'adam': | ||
optim = torch.optim.Adam(encoder.parameters(), lr=opt.lr, weight_decay=opt.weight_decay) | ||
|
||
scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, gamma=opt.lr_decay_rate, | ||
milestones=opt.lr_decay_epochs) | ||
print(opt.result_folder) | ||
print(opt.save_folder) | ||
outdir = opt.save_folder #opt.result_folder | ||
logfile = os.path.join(outdir, f'log_main.txt') | ||
# Initialize python logger | ||
logging.basicConfig(filename=logfile, level=logging.INFO) | ||
|
||
loader = get_data_loader(opt) | ||
|
||
loss_meter = AverageMeter('total_loss') | ||
it_time_meter = AverageMeter('iter_time') | ||
|
||
# keep two iterators to avoid waiting when starting new epoch, as the dataset is small and it matters for fast GPU's | ||
next_iter = loader.__iter__() | ||
t0 = time.time() | ||
for epoch in range(opt.epochs): | ||
actual_iter = next_iter | ||
next_iter = loader.__iter__() | ||
loss_meter.reset() | ||
it_time_meter.reset() | ||
for ii, ((im_x,im_y), _) in enumerate(actual_iter): | ||
optim.zero_grad() | ||
x, y = encoder(torch.cat([im_x.to(opt.gpus[0]), im_y.to(opt.gpus[0])])).chunk(2) | ||
|
||
align_loss_val = align_loss(x, y, alpha=opt.align_alpha) | ||
unif_loss_val = (uniform_loss(x, t=opt.unif_t) + uniform_loss(y, t=opt.unif_t)) / 2 | ||
loss = align_loss_val * opt.align_w + unif_loss_val * opt.unif_w | ||
loss_meter.update(loss, x.shape[0]) | ||
loss.backward() | ||
optim.step() | ||
|
||
it_time_meter.update(time.time() - t0) | ||
t0 = time.time() | ||
if ii % opt.log_interval == 0: | ||
logging_string = f"Epoch {epoch}/{opt.epochs}\tIt {ii}/{len(loader)}\t" + f"{loss_meter}\t{it_time_meter}" | ||
logging.info(logging_string) | ||
print(logging_string) | ||
|
||
scheduler.step() | ||
|
||
ckpt_file = os.path.join(opt.save_folder, 'encoder.pth') | ||
torch.save(encoder.module.state_dict(), ckpt_file) | ||
print(f'Saved to {ckpt_file}') | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
import torch | ||
|
||
|
||
class AverageMeter(object): | ||
r""" | ||
Computes and stores the average and current value. | ||
Adapted from | ||
https://github.com/pytorch/examples/blob/ec10eee2d55379f0b9c87f4b36fcf8d0723f45fc/imagenet/main.py#L359-L380 | ||
""" | ||
def __init__(self, name=None, fmt='.6f'): | ||
fmtstr = f'{{val:{fmt}}} ({{avg:{fmt}}})' | ||
if name is not None: | ||
fmtstr = name + ' ' + fmtstr | ||
self.fmtstr = fmtstr | ||
self.reset() | ||
|
||
def reset(self): | ||
self.val = 0 | ||
self.sum = 0 | ||
self.count = 0 | ||
|
||
def update(self, val, n=1): | ||
self.val = val | ||
self.sum += val * n | ||
self.count += n | ||
|
||
@property | ||
def avg(self): | ||
avg = self.sum / self.count | ||
if isinstance(avg, torch.Tensor): | ||
avg = avg.item() | ||
return avg | ||
|
||
def __str__(self): | ||
val = self.val | ||
if isinstance(val, torch.Tensor): | ||
val = val.item() | ||
return self.fmtstr.format(val=val, avg=self.avg) | ||
|
||
|
||
class TwoAugUnsupervisedDataset(torch.utils.data.Dataset): | ||
r"""Returns two augmentation and no labels.""" | ||
def __init__(self, dataset, transform): | ||
self.dataset = dataset | ||
self.transform = transform | ||
|
||
def __getitem__(self, index): | ||
image, _ = self.dataset[index] | ||
return self.transform(image), self.transform(image) | ||
|
||
def __len__(self): | ||
return len(self.dataset) |
Oops, something went wrong.