Git Product home page Git Product logo

faceswap-gan's Introduction

faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.

Updates

Date    Update
2018-03-03 Model architecture: Add a new notebook which contains an improved GAN architecture. The architecture is greatly inspired by XGAN and MS-D neural network.
2018-02-13 Video conversion: Add a new video procesisng script using MTCNN for face detection. Faster detection with configurable threshold value. No need of CUDA supported dlib. (New notebook: v2_test_vodeo_MTCNN)
2018-02-10 Video conversion: Add a optional (default False) histogram matching function for color correction into video conversion pipeline. Set use_color_correction = True to enable this feature. (Updated notebooks: v2_sz128_train, v2_train, and v2_test_video)

Descriptions

GAN-v1

GAN-v2

  • FaceSwap_GAN_v2_train.ipynb (recommneded for trainnig)

    • Script for training the version 2 GAN model.
    • Video conversion functions are also included.
  • FaceSwap_GAN_v2_test_video.ipynb

    • Script for generating videos.
    • Using face_recognition module for face detection.
  • FaceSwap_GAN_v2_test_video_MTCNN.ipynb (recommneded for video conversion)

    • Script for generating videos.
    • Using MTCNN for face detection. Does not reqiure CUDA supported dlib.
  • faceswap_WGAN-GP_keras_github.ipynb

    • This notebook contains a class of GAN mdoel using WGAN-GP.
    • Perceptual loss is discarded for simplicity.
    • The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
    gan = FaceSwapGAN() # instantiate the class
    gan.train(max_iters=10e4, save_interval=500) # start training
  • FaceSwap_GAN_v2_sz128_train.ipynb

    • Input and output images have larger shape (128, 128, 3).
    • Minor updates on the architectures:
      1. Add instance normalization to generators and discriminators.
      2. Add additional regressoin loss (mae loss) on 64x64 branch output.
    • Not compatible with _test_video and _test_video_MTCNN notebooks above.

Miscellaneous

  • dlib_video_face_detection.ipynb

    1. Detect/Crop faces in a video using dlib's cnn model.
    2. Pack cropped face images into a zip file.
  • Training data: Face images are supposed to be in ./faceA/ and ./faceB/ folder for each target respectively. Face images can be of any size.

Results

In below are results that show trained models transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).

1. Autorecoder baseline

Autoencoder based on deepfakes' script. It should be mentoined that the result of autoencoder (AE) can be much better if we train it longer.

  • Results:

    AE_results

2. Generative Adversarial Network, GAN (version 1)

  • Improved output quality: Adversarial loss improves reconstruction quality of generated images.

    GAN_PL_results.

  • VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face.

  • Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jitter on the swapped face.

3. Generative Adversarial Network, GAN (version 2)

  • Version 1 features: Most of the features in version 1 are inherited, including perceptual loss and smoothed bbox.

  • Unsupervised segmentation mask: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone.

    mask1  mask2

    • From left to right: source face, swapped face (before masking), swapped face (after masking).

    mask_vis

    • From left to right: source face, swapped face (after masking), mask heatmap.
  • Optional 128x128 input/output resolution: Increase input and output size from 64x64 to 128x128.

  • Mask refinement: VGGFace ResNet50 is introduced for mask refinement (as the preceptual loss). The following figure shows generated masks before/after refinement. Input faces are from CelebA dataset.

    mask_refinement

  • Mask comparison: The following figure shows comparison between (i) generated masks and (ii) face segmentations using YuvalNirkin's FCN netwrok. Surprisingly, FCN sometimes fails to segment out face occlusions (see the 2nd and 4th rows).

    mask_seg_comp

  • Face detection/tracking using MTCNN and Kalman filter: More stable detection and smooth tracking.

    dlib_vs_MTCNN

  • V2.1 update: An improved architecture is updated in order to stablize training. The architecture is greatly inspired by XGAN and MS-D neural network.

    • In v2.1 architecture, we add more discriminators/losses to the GAN. To be specific, they are:
      1. GAN loss for non-masked outputs: Add two more discriminators to non-masked outputs.
      2. Semantic consistency loss (XGAN): Use cosine distance of embeddings of real faces and reconstructed faces.
      3. Domain adversarial loss (XGAN): Encourage embeddings to lie in the same subspace.
    • One res_block in the decoder is replaced by MS-D network (default depth = 16) for output refinement.
      • This is a very inefficient implementation of MS-D network.
    • Preview images are saved in ./previews folder.
    • FCN8s for face segmentation is introduced to improve masking in video conversion (default use_FCN_mask = False).
      • To enable this feature, keras weights file should be generated through jupyter notebook provided in this repo.

Frequently asked questions

1. Slow video processing / OOM error?

  • It is likely due to too high resolution of input video, modify the parameters in step 13 or 14 will solve it.
    • First, increase video_scaling_offset = 0 to 1 or higher.
    • If it doesn't help, set manually_downscale = True.
    • If the above still do not help, disable CNN model for face detectoin.
      def process_video(...):
        ...
        #faces = get_faces_bbox(image, model="cnn") # Use CNN model
        faces = get_faces_bbox(image, model='hog') # Use default Haar features.  

2. How does it work?

  • This illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this.

3. No audio in output clips?

  • Set audio=True in the video making cell.
    output = 'OUTPUT_VIDEO.mp4'
    clip1 = VideoFileClip("INPUT_VIDEO.mp4")
    clip = clip1.fl_image(process_video)
    %time clip.write_videofile(output, audio=True) # Set audio=True

4. Previews look good, but video result does not seem to transform the face?

  • Default setting transfroms face B to face A.
  • To transform face A to face B, modify the following parameters depending on your current running notebook:
    • Change path_abgr_A to path_abgr_B in process_video() (step 13/14 of v2_train.ipynb and v2_sz128_train.ipynb).
    • Change whom2whom = "BtoA" to whom2whom = "AtoB" (step 12 of v2_test_video.ipynb).

Requirements

Acknowledgments

Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.

faceswap-gan's People

Contributors

shaoanlu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.