Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Xinyu Tang committed Jun 12, 2023
1 parent 7fc9ad4 commit de2444d
Show file tree
Hide file tree
Showing 34 changed files with 6,446 additions and 1 deletion.
69 changes: 68 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,68 @@
# DP-RandP
# DP-RandP

> Differentially Private Image Classification by Learning Priors from Random Processes [[arxiv](https://arxiv.org/abs/2306.06076)]
>
> Xinyu Tang*, Ashwinee Panda*, Vikash Sehwag, Prateek Mittal (*: equal contribution)
>
>

## Requirements
This version of code has been tested with Python 3.9.16 and PyTorch 1.12.1.

Set up environment via pip and anaconda
```
conda create -n "dprandp" python=3.9
conda activate dprandp
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r ./requirements.txt
```

Note that we made small edits in [opacus/data_loader.py](https://github.com/pytorch/opacus/blob/3a7e8f82a8d02cc1ed227f2ef287865d904eff8d/opacus/data_loader.py#L198) to make sure the expected batch size of poisson sampling is the same as the hyperparameters in the code instead of approximated by 1 / len(data_loader) to be consistant with privacy accounting.
```
sample_rate=data_loader.batch_size/len(data_loader.dataset), #instead of 1 / len(data_loader)
```

We also provide the corresponding data_loader.py we used in `./opacus_utils` for reference.
```
cp ./opacus_utils/data_loader.py $YOUR_opacus_lib_path/data_loder.py
```

We also make change accordingly in ./tan/src/opacus_augmented/privacy_engine_augmented.py (line 386-387).

Please make change accordingly for `--max_physical_batch_size` to fit your GPU memory.

## Experiment
For Phase I, please refer to folder `./learning_with_noise`.

For Phase II and Phase III, please refer to folder `./tan`.

For Table 4 in paper for DP-RandP w/o Phase III, please refer to foloder `./linear_prob`.

## Citation
```
@article{tang2023dprandp,
title={Differentially Private Image Classification by Learning Priors from Random Processes},
author={Xinyu Tang and Ashwinee Panda and Vikash Sehwag and Prateek Mittal},
journal={arXiv preprint arXiv:2306.06076},
year={2023}
}
```

## Credits
This code has been built upon the code accompanying the papers

"[Learning to See by Looking at Noise](https://arxiv.org/abs/2106.05963)" [[code](https://github.com/mbaradad/learning_with_noise)].

"[TAN Without a Burn: Scaling Laws of DP-SGD](https://arxiv.org/abs/2210.03403)" [[code](https://github.com/facebookresearch/tan)].

The hyperparameter setting of the code mostly follows the papers

"[Unlocking High-Accuracy Differentially Private Image Classification through Scale](https://arxiv.org/abs/2204.13650)" [[code](https://github.com/deepmind/jax_privacy)].

"[A New Linear Scaling Rule for Differentially Private Hyperparameter Optimization](https://arxiv.org/abs/2212.04486)" [[code](https://github.com/kiddyboots216/dp-custom)].



## Questions
If anything is unclear, please open an issue or contact Xinyu Tang (xinyut@princeton.edu).
38 changes: 38 additions & 0 deletions learning_with_noise/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
This repo is built based on [source repo](https://github.com/mbaradad/learning_with_noise) that contains the code for paper:

> [**Learning to See by Looking at Noise**] (NeurIPS 2021, Spotlight)
>
> [[Project page](https://mbaradad.github.io/learning_with_noise/)] [[Paper](https://arxiv.org/pdf/2106.05963.pdf)] [[arXiv](https://arxiv.org/abs/2106.05963)]
>
> Manel Baradad and Jonas Wulff and Tongzhou Wang and Phillip Isola and Antonio Torralba.
For more details, please check the source repo.

## Prepare Synthetic Data
We mainly use the `stylegan-oriented' images (other datases are also available) provided by the source repo. You can also refer to [Data Generation](https://github.com/mbaradad/learning_with_noise#data-generation) in source repo to generate synthetic images.

```
DATAPATH="/data/xinyut/randp/data"
wget -O $DATAPATH/stylegan-oriented.zip http://data.csail.mit.edu/noiselearning/zipped_data/small_scale/stylegan-oriented.zip
unzip $DATAPATH/stylegan-oriented.zip -d $DATAPATH/stylegan-oriented
```

## Training the model

Then you can launch the contrastive training in our Phase I with:
```
DATAPATH="/data/xinyut/randp/data"
ckpt_path="/data/xinyut/randp/ckpt"
python align_uniform/main.py --imagefolder $DATAPATH/stylegan-oriented --result $ckpt_path/stylegan-oriented --gpus $GPU_ID0 $GPU_ID1
```
Note that the contrastive training requires GPU_ID0 and GPU_ID1 each provides at least 4GB of memory for WRN16-4. You can use the same GPU for $GPU_ID0 and $GPU_ID1 if a single GPU has enough memotry.

## Citation
```
@inproceedings{baradad2021learning,
title={Learning to See by Looking at Noise},
author={Manel Baradad and Jonas Wulff and Tongzhou Wang and Phillip Isola and Antonio Torralba},
booktitle={Advances in Neural Information Processing Systems},
year={2021}
}
```
12 changes: 12 additions & 0 deletions learning_with_noise/align_uniform/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import torch


def align_loss(x, y, alpha=2):
return (x - y).norm(p=2, dim=1).pow(alpha).mean()


def uniform_loss(x, t=2):
return torch.pdist(x, p=2).pow(2).mul(-t).exp().mean().log()


__all__ = ['align_loss', 'uniform_loss']
176 changes: 176 additions & 0 deletions learning_with_noise/align_uniform/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
import os
import time
import argparse

import torchvision
import torch
import torch.nn as nn
import sys
sys.path.append("./")
from align_uniform.util import AverageMeter
from align_uniform.wrn import WideResNet
from align_uniform import align_loss, uniform_loss

import logging

def parse_option():
parser = argparse.ArgumentParser('STL-10 Representation Learning with Alignment and Uniformity Losses')

parser.add_argument('--temperature', type=float, default=0.5, help='Temperature')
parser.add_argument('--align_w', type=float, default=1, help='Alignment loss weight')
parser.add_argument('--unif_w', type=float, default=1, help='Uniformity loss weight')
parser.add_argument('--align_alpha', type=float, default=2, help='alpha in alignment loss')
parser.add_argument('--unif_t', type=float, default=2, help='t in uniformity loss')

parser.add_argument('--batch_size', type=int, default=256, help='Batch size')
parser.add_argument('--epochs', type=int, default=200, help='Number of training epochs')
parser.add_argument('--lr', type=float, default=None,
help='Learning rate. Default is linear scaling 0.12 per 256 batch size')
parser.add_argument('--lr_decay_rate', type=float, default=0.1, help='Learning rate decay rate')
parser.add_argument('--lr_decay_epochs', default=[155, 170, 185], nargs='*', type=int,
help='When to decay learning rate')
parser.add_argument('--momentum', type=float, default=0.9, help='SGD momentum')
parser.add_argument('--weight_decay', type=float, default=1e-4, help='L2 weight decay')
parser.add_argument('--feat_dim', type=int, default=128, help='Feature dimensionality')

parser.add_argument('--width', type=int, default=4, help="width factor for WideResNet")
parser.add_argument('--depth', type=int, default=16, help="depth factor for WideResNet")

parser.add_argument('--num_workers', type=int, default=2, help='Number of data loader workers to use')
parser.add_argument('--log_interval', type=int, default=1, help='Number of iterations between logs')
parser.add_argument('--gpus', default=[0], nargs='*', type=int,
help='List of GPU indices to use, e.g., --gpus 0 1 2 3')

parser.add_argument('--result_folder', type=str, default=None, help='Base directory to save model')
parser.add_argument('--arch', type=str, default='wrn', help='Base directory to save model')

parser.add_argument('--imagefolder', type=str, default=None, help='Path to imagefolder. If not used, use STL-10.')
parser.add_argument('--resize_image', action='store_true', help='Resize image to 96x96 (STL10 size) first')
parser.add_argument('--optimizer', type=str, default='sgd', help='Which optimizer to use (SGD or Adam)')

opt = parser.parse_args()

if opt.lr is None:
opt.lr = 0.12 * (opt.batch_size / 256)

opt.gpus = list(map(lambda x: torch.device('cuda', x), opt.gpus))

opt.save_folder = os.path.join(opt.result_folder, f"{opt.arch}{opt.depth}{opt.width}")
os.makedirs(opt.save_folder, exist_ok=True)

return opt


class TwoCropsTransform:
"""Take two random crops of one image as the query and key."""

def __init__(self, base_transform):
self.base_transform = base_transform

def __call__(self, x):
q = self.base_transform(x)
k = self.base_transform(x)
return [q, k]



def get_data_loader(opt):
transform_array = []
if opt.resize_image:
transform_array.append(
torchvision.transforms.Resize((32,32))
)

transform_array += [
torchvision.transforms.RandomResizedCrop(32, scale=(0.08, 1)),
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ColorJitter(0.4, 0.4, 0.4, 0.4),
torchvision.transforms.RandomGrayscale(p=0.2),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.44087801806139126, 0.42790631331699347, 0.3867879370752931),
(0.26826768628079806, 0.2610450402318512, 0.26866836876860795),
),
]

transform = torchvision.transforms.Compose(transform_array)

transform = TwoCropsTransform(transform)

train_path = os.path.join(opt.imagefolder, 'train')
print(f'Loading data from {opt.imagefolder} as imagefolder')
dataset = torchvision.datasets.ImageFolder(
train_path,
transform=transform)
small_scale_samples = 105000
assert len(dataset) == small_scale_samples, "Small scale experiment should have 105000 samples, and found {}".format(len(dataset))
loader = torch.utils.data.DataLoader(dataset, batch_size=opt.batch_size, num_workers=opt.num_workers,
shuffle=True, pin_memory=True)

return loader


def main():
opt = parse_option()

torch.cuda.set_device(opt.gpus[0])
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

encoder = nn.DataParallel(WideResNet(opt.depth, opt.feat_dim, opt.width, 16, 0, 0, 0).to(opt.gpus[0]), opt.gpus)

if opt.optimizer == 'sgd':
optim = torch.optim.SGD(encoder.parameters(), lr=opt.lr,
momentum=opt.momentum, weight_decay=opt.weight_decay)
elif opt.optimizer == 'adam':
optim = torch.optim.Adam(encoder.parameters(), lr=opt.lr, weight_decay=opt.weight_decay)

scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, gamma=opt.lr_decay_rate,
milestones=opt.lr_decay_epochs)
print(opt.result_folder)
print(opt.save_folder)
outdir = opt.save_folder #opt.result_folder
logfile = os.path.join(outdir, f'log_main.txt')
# Initialize python logger
logging.basicConfig(filename=logfile, level=logging.INFO)

loader = get_data_loader(opt)

loss_meter = AverageMeter('total_loss')
it_time_meter = AverageMeter('iter_time')

# keep two iterators to avoid waiting when starting new epoch, as the dataset is small and it matters for fast GPU's
next_iter = loader.__iter__()
t0 = time.time()
for epoch in range(opt.epochs):
actual_iter = next_iter
next_iter = loader.__iter__()
loss_meter.reset()
it_time_meter.reset()
for ii, ((im_x,im_y), _) in enumerate(actual_iter):
optim.zero_grad()
x, y = encoder(torch.cat([im_x.to(opt.gpus[0]), im_y.to(opt.gpus[0])])).chunk(2)

align_loss_val = align_loss(x, y, alpha=opt.align_alpha)
unif_loss_val = (uniform_loss(x, t=opt.unif_t) + uniform_loss(y, t=opt.unif_t)) / 2
loss = align_loss_val * opt.align_w + unif_loss_val * opt.unif_w
loss_meter.update(loss, x.shape[0])
loss.backward()
optim.step()

it_time_meter.update(time.time() - t0)
t0 = time.time()
if ii % opt.log_interval == 0:
logging_string = f"Epoch {epoch}/{opt.epochs}\tIt {ii}/{len(loader)}\t" + f"{loss_meter}\t{it_time_meter}"
logging.info(logging_string)
print(logging_string)

scheduler.step()

ckpt_file = os.path.join(opt.save_folder, 'encoder.pth')
torch.save(encoder.module.state_dict(), ckpt_file)
print(f'Saved to {ckpt_file}')


if __name__ == '__main__':
main()
52 changes: 52 additions & 0 deletions learning_with_noise/align_uniform/util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import torch


class AverageMeter(object):
r"""
Computes and stores the average and current value.
Adapted from
https://github.com/pytorch/examples/blob/ec10eee2d55379f0b9c87f4b36fcf8d0723f45fc/imagenet/main.py#L359-L380
"""
def __init__(self, name=None, fmt='.6f'):
fmtstr = f'{{val:{fmt}}} ({{avg:{fmt}}})'
if name is not None:
fmtstr = name + ' ' + fmtstr
self.fmtstr = fmtstr
self.reset()

def reset(self):
self.val = 0
self.sum = 0
self.count = 0

def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n

@property
def avg(self):
avg = self.sum / self.count
if isinstance(avg, torch.Tensor):
avg = avg.item()
return avg

def __str__(self):
val = self.val
if isinstance(val, torch.Tensor):
val = val.item()
return self.fmtstr.format(val=val, avg=self.avg)


class TwoAugUnsupervisedDataset(torch.utils.data.Dataset):
r"""Returns two augmentation and no labels."""
def __init__(self, dataset, transform):
self.dataset = dataset
self.transform = transform

def __getitem__(self, index):
image, _ = self.dataset[index]
return self.transform(image), self.transform(image)

def __len__(self):
return len(self.dataset)
Loading

0 comments on commit de2444d

Please sign in to comment.