Existing deepfake datasets like DeepfakeDetection and FaceForensics++ have advanced detection research but are limited by constrained real videos featuring a few actors and fake videos generated using popular software. As a result, detectors trained on these datasets often struggle with the diversity of real-world deepfakes found online.
To address this, we introduce WildDeepfake, a dataset of 7,314 face sequences from 707 deepfake videos sourced entirely from the internet. Despite its small size, WildDeepfake better represents the challenges of real-world detection, where baseline detectors show significantly reduced performance.
To enhance detection, we also propose Attention-based Deepfake Detection Networks (ADDNets), utilizing 2D and 3D attention mechanisms to improve focus on real/fake facial features.
- A comparision to previous datasets (before our work)
Dataset name | Download | Generate method | Deepfake videos | Actors |
---|---|---|---|---|
Deepfake-TIMIT low | download | Deepfake | 320 | 32 |
Deepfake-TIMIT high | download | Deepfake | 320 | 32 |
Faceforensics | - | Deepfake | 1000 | 977 |
Faceforensics++ | download | Deepfake | 1000 | 977 |
Deepfake detection | download | Deepfake | over3000 | 28 |
Celeb-deepfakeforensics v1 | download | Deepfake | 795 | 13 |
Celeb-deepfakeforensics v2 | download | Deepfake | 590 | 59 |
DFDC | download | Deepfake | - | - |
WildDeepfake | download | Internet | 707 | Unknown |
- File Structure:
deepfake_in_the_wild
|--real train
|--0.tar.gz
|--1.tar.gz
|--2.tar.gz
...
|--real test
|--0.tar.gz
|--1.tar.gz
|--2.tar.gz
...
|--fake train
|--0.tar.gz
|--1.tar.gz
|--2.tar.gz
...
|--fake test
|--0.tar.gz
|--1.tar.gz
|--2.tar.gz
...
In each tar.gz file, there will be several folders containing face images, and the images in each folder represent a face sequence. The image name in the folder represents the frame number it appears in the original video.
You will need to fill an agreement form to use the dataset, which is now avalibble on Hugging Face click to download.
If you use this dataset in your research, please cite it as follows:
@inproceedings{zi2020wilddeepfake,
title={Wilddeepfake: A challenging real-world dataset for deepfake detection},
author={Zi, Bojia and Chang, Minghao and Chen, Jingjing and Ma, Xingjun and Jiang, Yu-Gang},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2382--2390},
year={2020}
}
To ensure the privacy of individuals featured in the dataset, we have implemented the following measures:
- Restricted Use: The dataset is strictly for research purposes, and only face sequences are released, not full videos.
- Privacy Protection in Publications: Key facial features are obscured in all visual materials, including papers and presentations. Additionally, strict access controls are in place.
- Applicant Verification: Access is granted only after verifying the applicant’s academic email address, personal electronic signature, and other necessary credentials.
- Usage Agreement: Applicants are required to sign a comprehensive agreement to ensure the dataset is used exclusively for research purposes.
- Right to Removal: If any part of the dataset impacts you, please contact us to request its removal.
We are committed to safeguarding privacy while enabling research advancements.