Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.
Date | Update |
---|---|
2018-03-17 | Training: V2 model now provides a 40000-iter training schedule which automatically switches to proper loss functions at predefined iterations. (Cage/Trump dataset results) |
2018-03-13 | Model architecture: V2.1 model now provides 3 base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. See "4. Training Phase Configuration" in v2.1 notebook for detail. |
2018-03-03 | Model architecture: Add a new notebook which contains an improved GAN architecture. The architecture is greatly inspired by XGAN and MS-D neural network. |
2018-02-13 | Video conversion: Add a new video procesisng script using MTCNN for face detection. Faster detection with configurable threshold value. No need of CUDA supported dlib. (New notebook: v2_test_vodeo_MTCNN) |
-
FaceSwap_GAN_v2_train.ipynb (recommned for trainnig)
- Script for training the version 2 GAN model.
- Video conversion functions are also included.
-
FaceSwap_GAN_v2_test_video.ipynb
- Script for generating videos.
- Using face_recognition module for face detection.
-
FaceSwap_GAN_v2_test_video_MTCNN.ipynb (recommned for video conversion)
- Script for generating videos.
- Using MTCNN for face detection. Does not reqiure CUDA supported dlib.
-
faceswap_WGAN-GP_keras_github.ipynb
- This notebook contains a class of GAN mdoel using WGAN-GP.
- Perceptual loss is discarded for simplicity.
- The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
gan = FaceSwapGAN() # instantiate the class gan.train(max_iters=10e4, save_interval=500) # start training
-
FaceSwap_GAN_v2_sz128_train.ipynb
- Input and output images have larger shape
(128, 128, 3)
. - Minor updates on the architectures:
- Add instance normalization to generators and discriminators.
- Add additional regressoin loss (mae loss) on 64x64 branch output.
- Not compatible with
_test_video
and_test_video_MTCNN
notebooks above.
- Input and output images have larger shape
-
dlib_video_face_detection.ipynb
- Detect/Crop faces in a video using dlib's cnn model.
- Pack cropped face images into a zip file.
-
Training data: Face images are supposed to be in
./faceA/
and./faceB/
folder for each target respectively. Face images can be of any size.
-
Improved output quality: Adversarial loss improves reconstruction quality of generated images.
-
VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face.
-
Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jitter on the swapped face.
-
Unsupervised segmentation mask: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone. In below are results transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).
- From left to right: source face, swapped face (before masking), swapped face (after masking).
- From left to right: source face, swapped face (after masking), mask heatmap.
Source video: 佐野ひなことすごくどうでもいい話?(遊戯王)
-
Optional 128x128 input/output resolution: Increase input and output size from 64x64 to 128x128.
-
Face detection/tracking using MTCNN and Kalman filter: More stable detection and smooth tracking.
-
Training schedule: V2 model provides a predefined training schedule. The Trump/Cage results above are generated by model trained for 21k iters using
TOTAL_ITERS = 30000
predefined training schedule. -
V2.1 update: An improved architecture is updated in order to stablize training. The architecture is greatly inspired by XGAN
and MS-D neural network.- V2.1 model provides three base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. (default
base_model="GAN"
) - Add more discriminators/losses to the GAN. To be specific, they are:
- GAN loss for non-masked outputs (common): Add two more discriminators to non-masked outputs.
- Perceptual adversarial loss (common): Feature level L1 loss which improves semantic detail.
- Domain-adversarial loss (XGAN): "It encourages the embeddings learned by the encoder to lie in the same subspace"
- Semantic consistency loss (XGAN): Loss of cosine distance of embeddings to preserve semantic of input.
- KL loss (VAE-GAN): KL divergence between N(0,1) and latent vector.
One.res_block
in the decoder is replaced by MS-D network (default depth = 16) for output refinementThis is a very inefficient implementation of MS-D network.MS-D network is not included for now.
- Preview images are saved in
./previews
folder. - (WIP) Random motion blur as data augmentation, reducing ghost effect in output video.
- FCN8s for face segmentation is introduced to improve masking in video conversion (default
use_FCN_mask = True
).- To enable this feature, keras weights file should be generated through jupyter notebook provided in this repo.
- V2.1 model provides three base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. (default
- It is likely due to too high resolution of input video, modify the parameters in step 13 or 14 will solve it.
- First, increase
video_scaling_offset = 0
to 1 or higher. - If it doesn't help, set
manually_downscale = True
. - If the above still do not help, disable CNN model for face detectoin.
def process_video(...): ... #faces = get_faces_bbox(image, model="cnn") # Use CNN model faces = get_faces_bbox(image, model='hog') # Use default Haar features.
- First, increase
- The following illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this.
- Set
audio=True
in the video making cell.output = 'OUTPUT_VIDEO.mp4' clip1 = VideoFileClip("INPUT_VIDEO.mp4") clip = clip1.fl_image(process_video) %time clip.write_videofile(output, audio=True) # Set audio=True
- Default setting transfroms face B to face A.
- To transform face A to face B, modify the following parameters depending on your current running notebook:
- Change
path_abgr_A
topath_abgr_B
inprocess_video()
(step 13/14 of v2_train.ipynb and v2_sz128_train.ipynb). - Change
whom2whom = "BtoA"
towhom2whom = "AtoB"
(step 12 of v2_test_video.ipynb).
- Change
- keras 2
- Tensorflow 1.3
- Python 3
- OpenCV
- keras-vggface
- moviepy
- dlib (optional)
- face_recognition (optinoal)
Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.