Skip to content

Anonymous-Authors-Repo/watermark_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Did You Train on My Dataset? Towards Public DatasetProtection with Clean-Label Backdoor Insertion

This is the pytorch implemention for paper “Did You Train on My Dataset? Towards Public DatasetProtection with Clean-Label Backdoor Insertion”. The huge supporting training data on the Internet has been a key factor in the success of deep learning models. However, it also raises concerns about the unauthorized exploitation of the dataset, e.g., for commercial propose, which is forbidden by the dataset licenses. In this paper, we introduce a backdoor-based watermarking approach that can be used as a general framework to protect public-available data.

Environment

  • pytorch==1.6.0
  • torchvision==0.7.0
  • python==3.6
  • numpy==1.18.1

Pipeline

The watermarking process is as follows. The defender first chooses a target class C, and collects a fraction of data from class C as the watermarking examples Dwm. Defenders then apply the adversarial transformation to all samples in Dwm. Finally, a preset trigger pattern t is added to Dwm. Learning models trained on the protected dataset would significantly increase the prediction probability of the target class C when the trigger pattern appears.

Image Data

We show the code for Cifar-10 and Caltech256 dataset in Code/Image.

Text Data

We show the code for SST-2, IMDB and NLI dataset in Code/NLP.

Audio Data

We show the code for AudioMnist dataset in Code/Audio.

Outlier Detection

We investigate the stealthiness of the watermarking samples. For image data, we adopt two commonly used autoencoder-based [code] and confidence-based [code] outlier detection methods. For text data, we identify outlier by measuring the grammatical error [link] increase rate in watermarking samples.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published