This is an auxiliary repo for downloading VoxCeleb videos and preprocessing of the extracted frames by cropping them around the face. For detecting and cropping the face area we use the landmark estimation method proposed in [1], face-alignment.
- Python 3.5+
- Linux
- Pytorch (>=1.5)
pip install -r requirements.txt
pip install --upgrade youtube_dl
sudo apt-get install ffmpeg
Path | Description |
---|---|
FaceDetector | SFD face detector for face-alignment. |
- Download videos of VoxCeleb1 or VoxCeleb2 dataset from youtube
- Split videos in smaller ones using the metadata provided by the datasets and delete original videos
- Extract frames from each video with REF_FPS = 25
- Crop frames using the face boxes from the metadata and facial landmarks
- Files are saved as:
.path/to/voxdataset
|-- id10271 # identity index
| |-- 37nktPRUJ58 # video index
| | |-- chunk_videos # chunk_videos: original video splitted in smaller ones
| | | |-- 37nktPRUJ58#00001#257-396.mp4
| | | |-- ...
| | |-- frames # extracted frames
| | | |-- 00_000025.png
| | | |-- ...
| | |-- frames_cropped # preprocessed frames
| | | |-- 00_000025.png
| | | |-- ...
| |-- Zjc7Xy7aT8c
| | | ...
|-- id10273
| | ...
wget www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox1_test_txt.zip
unzip vox1_test_txt.zip
mv ./txt ./vox1_txt_test
wget www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox1_dev_txt.zip
unzip vox1_dev_txt.zip
mv ./txt ./vox1_txt_train
wget www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox2_test_txt.zip
unzip vox2_test_txt.zip
mv ./txt ./vox2_txt_test
wget www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox2_dev_txt.zip
unzip vox2_dev_txt.zip
mv ./txt ./vox2_txt_train
- Run this script to download videos from youtube. Note that the original videos will be removed. Optionally extract and preprocess frames.
python download_voxCeleb.py --dataset vox1 --output_path ./VoxCeleb1_test --metadata_path ./vox1_txt_test --delete_mp4
- If videos have already been downloaded, run this script to extract and preprocess frames.
python preprocess_voxCeleb.py --dataset vox1 --root_path ./VoxCeleb1_test --metadata_path ./vox1_txt_test
This code borrows from video-preprocessing and face-alignment.
[1] Bulat, Adrian, and Georgios Tzimiropoulos. "How far are we from solving the 2D & 3D face alignment problem?(and a dataset of 230,000 3d facial landmarks)." Proceedings of the IEEE International Conference on Computer Vision. 2017.