PDF-Embedding

[NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"

Dataset

We release the dataset at HuggingFace. Please follow the instructions here to download the D-Rep dataset in our paper.

Demonstration

Installation

conda create -n pdfembedding python=3.9
conda activate pdfembedding
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install timm==0.4.12
git clone https://github.com/WangWenhao0716/PDF-Embedding.git
cd PDF-Embedding

Feature Extraction

import requests
import torch
from PIL import Image

from pdf_embedding_extractor import preprocessor, create_model

model_name = 'vit_base_query'
weight_name = 'vit_exp_563.pth.tar'#'vit_gauss_760.pth.tar'
model = create_model(model_name, weight_name).cuda()

url_real = "https://huggingface.co/datasets/WenhaoWang/D-Rep/resolve/main/Irises_real.jpg"
image_real = Image.open(requests.get(url_real, stream=True).raw)
x_real = preprocessor(image_real).unsqueeze(0).cuda()
pdf_features_real = model.forward_features(x_real)  # => torch.Size([1, 6, 768])

url_gen = "https://huggingface.co/datasets/WenhaoWang/D-Rep/resolve/main/Irises_gen.jpg"
image_gen = Image.open(requests.get(url_gen, stream=True).raw)
x_gen = preprocessor(image_gen).unsqueeze(0).cuda()
pdf_features_gen = model.forward_features(x_gen)  # => torch.Size([1, 6, 768])

Matching

# Assume we have two pdf_features: pdf_features_real and pdf_features_gen  # torch.Size([1, 6, 768])
from torch.nn import functional as F
cosine_similarity = F.cosine_similarity(pdf_features_real, pdf_features_gen, dim=2) # => torch.Size([1, 6])
_, indices = torch.max(cosine_similarity, dim=1)
similarity = (5-indices)/5 # => 0.8

Known issues

The model described herein may yield false positive or negative predictions. Consequently, the contents of this paper should not be construed as legal advice.
The currently released models are only trained on 36,000 image pairs and are NOT ready for commercial use. If you plan to use them commercially, please contact wangwenhao0716@gmail.com.
We do not release the training code currently because of its potential commercial value.

Citation

@article{wang2024icdiff,
  title={Image Copy Detection for Diffusion Models},
  author={Wang, Wenhao and Sun, Yifan and Tan, Zhentao and Yang, Yi},
  booktitle={Thirty-eighth Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=gvlOQC6oP1}
}

License

We release the code and trained models under the CC-BY-NC-4.0 license.

Contact

If you have any questions, feel free to contact Wenhao Wang (wangwenhao0716@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
models		models
LICENSE		LICENSE
PDF-Embedding.jpg		PDF-Embedding.jpg
README.md		README.md
match.jpg		match.jpg
pdf_embedding_extractor.py		pdf_embedding_extractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Embedding

Dataset

Demonstration

Installation

Feature Extraction

Matching

Known issues

Citation

License

Contact

About

Releases

Packages

Languages

License

WangWenhao0716/PDF-Embedding

Folders and files

Latest commit

History

Repository files navigation

PDF-Embedding

Dataset

Demonstration

Installation

Feature Extraction

Matching

Known issues

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages