This repository is an implementation to detect and track the reference face from a video.
conda create --name=facetrack python=3.10
conda activate facetrack
pip install tensorflow[and-cuda]
pip install tf-keras
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install deepface
pip install scenedetect[opencv]
pip install loguru scipy
SAM2-based VOS method SAMURAI is used. SAM2 needs to be installed first before use. The code requires python>=3.10
, as well as torch>=2.3.1
and torchvision>=0.18.1
. Please follow the instructions here to install both PyTorch and TorchVision dependencies. You can install the SAMURAI version of SAM 2 on a GPU machine using:
cd sam2
pip install -e ".[notebooks]"
Please see INSTALL.md from the original SAM 2 repository for FAQs on potential issues and solutions.
Since the project uses both Tensorflow (cudnn 9.3) and PyTorch (9.1), need a compatible cudnn version after installing both:
pip uninstall nvidia-cudnn-cu11
pip install nvidia-cudnn-cu12==9.3.0.75
cd checkpoints && \
./download_ckpts.sh && \
cd .. && \
cd ..
Please prepare the data in the following format:
data/
├── demo1/
│ ├── video
| |── reference face image
python scripts/main_facetrack.py --video_path data/demo1/top_gun.mp4 --ref_path data/demo1/tom_cruise.jpg
Note: Need at least 8GB memory to run successfully. We can relax this requirement by using smaller SAM2 or deepface model.
Current default setting uses SAM2 modelsam2.1_hiera_base_plus.pt
, deepface models VGG-Face
for verification, and retinaface
for face detection.
After the run, you will have a folder clips/
containing all crops containing the target face, with a json file metadata.json
containing annotations of face coordinates and timestamps:
data/
├── demo1/
│ ├── video_file_name.mp4
| |── reference_image.png
### generated files ###
| |── [video_file_name]_scenes/ # contains splitted scenes
| |── clips/
| | |── clip-0000.mp4
| | |── clip-0001.mp4
| | |── ...
| |── metadata.json
Check the visualization of face bboxes on original video in file visualization.ipynb.
The corresponding annotated videos are stored as demo.mp4
. One example shown here: