Skip to content

A denoising autoencoder + adversarial loss for face swapping.

Notifications You must be signed in to change notification settings

PhilipMantrov/faceswap-GAN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.

Updates

Date    Update
2018-03-17     Training: V2 model now provides a 40000-iter training schedule which automatically switches to proper loss functions at predefined iterations. (Cage/Trump dataset results)
2018-03-13     Model architecture: V2.1 model now provides 3 base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. See "4. Training Phase Configuration" in v2.1 notebook for detail.
2018-03-03 Model architecture: Add a new notebook which contains an improved GAN architecture. The architecture is greatly inspired by XGAN and MS-D neural network.
2018-02-13 Video conversion: Add a new video procesisng script using MTCNN for face detection. Faster detection with configurable threshold value. No need of CUDA supported dlib. (New notebook: v2_test_vodeo_MTCNN)

Descriptions

GAN-v2

  • FaceSwap_GAN_v2_train.ipynb (recommned for trainnig)

    • Script for training the version 2 GAN model.
    • Video conversion functions are also included.
  • FaceSwap_GAN_v2_test_video.ipynb

    • Script for generating videos.
    • Using face_recognition module for face detection.
  • FaceSwap_GAN_v2_test_video_MTCNN.ipynb (recommned for video conversion)

    • Script for generating videos.
    • Using MTCNN for face detection. Does not reqiure CUDA supported dlib.
  • faceswap_WGAN-GP_keras_github.ipynb

    • This notebook contains a class of GAN mdoel using WGAN-GP.
    • Perceptual loss is discarded for simplicity.
    • The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
    gan = FaceSwapGAN() # instantiate the class
    gan.train(max_iters=10e4, save_interval=500) # start training
  • FaceSwap_GAN_v2_sz128_train.ipynb

    • Input and output images have larger shape (128, 128, 3).
    • Minor updates on the architectures:
      1. Add instance normalization to generators and discriminators.
      2. Add additional regressoin loss (mae loss) on 64x64 branch output.
    • Not compatible with _test_video and _test_video_MTCNN notebooks above.

Miscellaneous

  • dlib_video_face_detection.ipynb

    1. Detect/Crop faces in a video using dlib's cnn model.
    2. Pack cropped face images into a zip file.
  • Training data: Face images are supposed to be in ./faceA/ and ./faceB/ folder for each target respectively. Face images can be of any size.

Results

Generative Adversarial Network, GAN (version 2)

  • Improved output quality: Adversarial loss improves reconstruction quality of generated images. trump_cage

  • VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face.

  • Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jitter on the swapped face.

  • Unsupervised segmentation mask: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone. In below are results transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).

    mask1  mask2

    • From left to right: source face, swapped face (before masking), swapped face (after masking).

    mask_vis

    • From left to right: source face, swapped face (after masking), mask heatmap.
  • Optional 128x128 input/output resolution: Increase input and output size from 64x64 to 128x128.

  • Face detection/tracking using MTCNN and Kalman filter: More stable detection and smooth tracking.

    dlib_vs_MTCNN

  • Training schedule: V2 model provides a predefined training schedule. The Trump/Cage results above are generated by model trained for 21k iters using TOTAL_ITERS = 30000 predefined training schedule.

  • V2.1 update: An improved architecture is updated in order to stablize training. The architecture is greatly inspired by XGAN and MS-D neural network.

    • V2.1 model provides three base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. (default base_model="GAN")
    • Add more discriminators/losses to the GAN. To be specific, they are:
      1. GAN loss for non-masked outputs (common): Add two more discriminators to non-masked outputs.
      2. Perceptual adversarial loss (common): Feature level L1 loss which improves semantic detail.
      3. Domain-adversarial loss (XGAN): "It encourages the embeddings learned by the encoder to lie in the same subspace"
      4. Semantic consistency loss (XGAN): Loss of cosine distance of embeddings to preserve semantic of input.
      5. KL loss (VAE-GAN): KL divergence between N(0,1) and latent vector.
    • One res_block in the decoder is replaced by MS-D network (default depth = 16) for output refinement.
      • This is a very inefficient implementation of MS-D network. MS-D network is not included for now.
    • Preview images are saved in ./previews folder.
    • (WIP) Random motion blur as data augmentation, reducing ghost effect in output video.
    • FCN8s for face segmentation is introduced to improve masking in video conversion (default use_FCN_mask = True).
      • To enable this feature, keras weights file should be generated through jupyter notebook provided in this repo.

Frequently asked questions

1. Slow video processing / OOM error?

  • It is likely due to too high resolution of input video, modify the parameters in step 13 or 14 will solve it.
    • First, increase video_scaling_offset = 0 to 1 or higher.
    • If it doesn't help, set manually_downscale = True.
    • If the above still do not help, disable CNN model for face detectoin.
      def process_video(...):
        ...
        #faces = get_faces_bbox(image, model="cnn") # Use CNN model
        faces = get_faces_bbox(image, model='hog') # Use default Haar features.  

2. How does it work?

  • The following illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this. flow_chart

3. No audio in output clips?

  • Set audio=True in the video making cell.
    output = 'OUTPUT_VIDEO.mp4'
    clip1 = VideoFileClip("INPUT_VIDEO.mp4")
    clip = clip1.fl_image(process_video)
    %time clip.write_videofile(output, audio=True) # Set audio=True

4. Previews look good, but video result does not seem to transform the face?

  • Default setting transfroms face B to face A.
  • To transform face A to face B, modify the following parameters depending on your current running notebook:
    • Change path_abgr_A to path_abgr_B in process_video() (step 13/14 of v2_train.ipynb and v2_sz128_train.ipynb).
    • Change whom2whom = "BtoA" to whom2whom = "AtoB" (step 12 of v2_test_video.ipynb).

Requirements

Acknowledgments

Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.

About

A denoising autoencoder + adversarial loss for face swapping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 87.0%
  • Python 13.0%