Solution to Deepfake Detection Challenge
All the dependencies are listed in requirements.txt
, which is generated by pipreqs
.
# please check requirements.txt
pip install -r requirements.txt
We use Slurm
to manage computing resources, and all training scripts use srun
to spawn multiple processes for synchronized SGD.
All training videos should be put in the folder DFDC-Kaggle
. There should be 50 sub-folders named as dfdc_train_part_0
, ..., dfdc_train_part_49
, each of which contains videos from one part of the DFDC dataset. We use the command
python extract_frames.py
to extract frames from all videos, which will be stored in DFDC-Kaggle_image
. Note that these frames, which have the same resolution as the original videos, take up a significant amount of disk space.
After frame extraction, we use an open-source face detector RetinaFace
to detect faces in all the frames. Note that the detector is the same as the one we used in our inference pipeline (see inference
folder). The detection results should be saved in the folder DFDC-Kaggle_Retinaface
. As an example, for the frame stored at DFDC-Kaggle_image/dfdc_train_part_0/aaqaifqrwn/frame1.png
, we will generate a text file at DFDC-Kaggle_Retinaface/dfdc_train_part_0/aaqaifqrwn/frame1.txt
, and its content would be the numbers below.
6
766 238 215 317 0.99811953
1530 925 136 133 0.3763613
1805 990 43 57 0.07622631
1278 916 140 131 0.0581847
1490 959 113 110 0.033537783
1826 978 63 77 0.022241158
With extracted frames and detected face boxes, we perform simple IoU-based tracking and face size alignment to finally obtain the aligned faces that are used for training our models. The aligned faces will be put in a folder named DFDC-Kaggle_Alignedface
. More specifically, we simply use the command below.
python save_aligned_faces.py
Feel free to use multi-processing techniques to speed up the preprocessing steps.
There are six pre-trained models for initialization, and they should be downloaded and put in the pretrain
folder prior to training.
After the steps above, we use the command below to train all models sequentially. Note that <distributed file path>
should be an empty absolute path which will store shared files for initializing process groups. The final parameter <task name>
is optional, and only specifies the job name in Slurm
.
sh train.sh <slurm partition name> <distributed file path> <task name>
You may look into train.sh
to see the specific configs that are used for training. Data lists that are used by training have been put in the folder DFDC-Kaggle_list
. By default, image-based models require 8 GPUs, and video-based models require 16 GPUs. In our submitted solution, we used 7 image-based models and 4 video-based models.
Note that sometimes model training may stuck or stop unexpectedly due to some random issues. In this case, you may use image_based/recover.sh
or video_based/recover.sh
to resume training.
Please refer to a separate README.md
in the folder inference
.