Git Product home page Git Product logo

swap-face's Introduction

swap-face

This is a complete guide for the swap-face algorithm known as deepFakes.

Why this project?

The goal of this project is education purpose. The deepFakes project got viral on the internet (with all the moral implication that come with it), but very few people really understand how it works. In fact, it is relatively simple>

The youtube video about deepfake made by Siraj Raval also gives a good insight about deepfakes, and I invite you to check its video.

Coding Challenge

The coding challenge presented in deepFakes by Siraj Raval.

This coding challenge is to train a swap-face algorithm.

The submission files are the three Ipython Notebooks:

1. Training data generation.ipynb
2. Train.ipynb
3. Prediction.ipynb

James bond swapping-face

I was born in 1988. So for me, James Bond has always been Pierce Brosnan (even Daniel Craig doesn't play that bad). As a result, today we will learn how to train a network that swap the face of Daniel Craig and replace it by Pierce Brosnan. The result can be seen below.

What id DeepFakes?

Deepfake is implemented as an autoencoder. The face of the actor 1 is cropped and aligned to his face. Then the autoencoder learns how to encode and decode (reconstruct) the face. The goal here is to minimize the reconstruction error.

The face of the actor 2 is cropped and aligned to his face. Then the autoencoder learns how to encode and decode (reconstruct) the face. The goal here is to minimize the reconstruction error. The interesting part is that the encoder is the same for actor 1 and 2.

The goal is to find a mathematical function that for a face of the actor 1, outputs a face that looks like the actor 2.

Note: these images don t belong to me and comes from the animation in the youtube video about deepfake made by Siraj Raval

Autoencoder: how does it work ?

Auto encoder wants to find a function f(x)xwhere x is your image (here, the size is arbitrary chosen to be 64 by 64 with 3 channels (RBG) ). For that, we have an encoder which goal is to encode your image in a smaller representation (Here, we choose 8 by 8 with 512 channels ), and a decoder which goal is to use this representation and get back the original images.

You could see an autoencoder as WinZip. You have a folder that you want to compress in zip. First, WinZip will parse your folder and find out some pattern (some data that are more useful than others) then it will take advantage of this information to encode these data on less bit. Finally, you end up with a zip file with the same information by encoded differently. Later if you give this file to a friend this friend needs to decode the zip file in order to read its content. We call that lossless compression (you don't lose information). In the case of autoencoder, it does a kind of lossy compression (the data that you get back at the end is an approximation of the initial data).

The goal of an autoencoder is to find the function f(x) = decoder(encoder(x))x that approximates the best. For that (in contrast of WinZip or WinRAR) it takes advantage of the nature of x. (1) x is an image. (2) x shows the same face but with different angle, lightening condition ... (1) often means that you might need to use convolution operations. (2) implies that you need to decompose your face into its atomic components, such as the shape of the noise, the shape of the smile, the wrinkle ...

You should see the output of the encoder as 512 different images of 8 by 8. Each of these images gives some insight about the noise shape, the ear shape ... These features represent the persons face and will be used by the decoder to reconstruct the image.

Additional Details

The authors of the original implementation contacted me, and mentioned an important point about the architecture of this tool:

  1. In the original deepfakes' architecture, there is no mask segmentation. The only output is reconstructed image.
  2. The interesting (and smart) parts of deepfakes' algorithm are the usage of warped image as input and shared-weights encoder. The encoder get update information (backprop. information) from both two decoder_A and B. But on the contrary, decoder_A never trained on face B for reconstruction (and vise versa for decoder_B). In other words, the encoder is able to encode both face A and B into good embedding, but decoder A/B can only generates face A/B respectively from the given embedding.
  3. To swap face of person B to person A in test time, we feed face B in to encoder to get the embedding, and then feed the embedding into decoder A (not decoder B) to get a face B which is face A look-alike. The reason of such method works is that the autoencoder_A (i.e., encoder + decoder_A) treats face B as if its a warped A, so it "reconstruct" face B into a face A look-alike. If we did not train the autoencoder using warped images, it might not able to "treat face B as if its a warped A".

Transfer Learning

In contrast to the base version of Deepfakes, we provide a very convenient way to speed up the training process via transfer learning. The idea is to load the weight of a pre-trained network (for the same network configuration but for another face pair such as Donald ump and Nicolas Cage) These weights are used as a starting point to learn new weight. The network will, therefore, converge faster than if it would be trained from scratch. \n",

More advanced technique exists to apply transfer learning to autoencoder such here

The dependencies

Associated with this project, there is a file called "floyd_requirements.txt" which lists all libraries that need to be installed before to run the application.

You install them with sudo pip install

If you have any problem. Feel free to contact me or post an issue.

The version I release can be run on any computer or virtual machines (as long as the requirement are installed).

Floydhub

The nice thing is that you can also run it easily on floydhub (which is a cloud GPU company). Please note that I don't work for this company (just that you often need a good GPU to train such model)

The weights

You can download the weights there

Result

Input Output

Moral implications

The goal of this project is education purpose. With AI area, it becomes easy for anyone to fake video, pictures, and news. As someone smart said one day: "With great power comes great responsibility".

Additional discussions

Autoencoder is great tool for producing images which respect the probability distribution of the original ones. However, I believe that Generative Adversarial Networks (such as cycle-gan) remains a great way to train autoencoder, and surpass the generation capability of standard autoencoder.

Notes

This tutorial aims to help other to understand the value of DeepFakes algorithm. However, if you have additional questions, feel free to open an issue. I want to make this guide as complete as possible and I will add to it any of your comment that would make it more understandable and detailled.

swap-face's People

Contributors

ovalery16 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

swap-face's Issues

Question about the Encoder & Decoder model

Hello, don't know if you are the original author of this tutorial – if so, thank you for making this so easy to understand for beginners!

Hope you can help me understand the Autoencoder model better:

IMAGE_SHAPE = (64, 64, 3)
ENCODER_DIM = 1024

    def Encoder(self):
        input_ = Input(shape=IMAGE_SHAPE)
        x = input_
        x = self.conv(128)(x)
        x = self.conv(256)(x)
        x = self.conv(512)(x)
        x = self.conv(1024)(x)
        x = Dense(ENCODER_DIM)(Flatten()(x))
        x = Dense(4 * 4 * 1024)(x)
        x = Reshape((4, 4, 1024))(x)
        x = self.upscale(512)(x)
        return KerasModel(input_, x)

    def Decoder(self):
        input_ = Input(shape=(8, 8, 512))
        x = input_
        x = self.upscale(256)(x)
        x = self.upscale(128)(x)
        x = self.upscale(64)(x)
        x = Conv2D(3, kernel_size=5, padding='same', activation='sigmoid')(x)
        return KerasModel(input_, x)

What would be an efficient way to increase the input image size (128, 192, 256) so that the Decoder is still producing good results? Which values are important – which should be increased? The filters of the Encoder or Decoder? Simply more convolution layers? Or is the ENCODER_DIM more important?

Why does the Decoder have this specific input shape? Is it connected to the input image size?
input_ = Input(shape=(8, 8, 512))

failed to swap images in test data and ended up getting B/W patch over detected face in test data

Hi ,

I tried face-swapping between Nickolas cage and Donald Trump Dataset but my output of the test script contains a blur grayscale patch over the face detected in trumps faces in the images of my test data. The scripts seem to run and finish without throwing up any errors and I trained the model without any specific error. Can someone help me with the possible reason be behind this happening?
I have uploaded 2/5 output images after performing the execution of prediction scripts
The images used are of size 256X256 while training and testing and I just used the same images( 5 images) for from training data to perform the test.
I used around 375 images for trump faces training and approx 310 for cage training faces. I am stuck at this point and can't proceed further to reach any conclusion

51130785

Tha
3146925

Original weights?

Hi

Ive managed to produce some outputs the results are quite blurry, so I've been trying out MSE instead of MAE. However, I've noticed the weights you given us are from the bond transfer, do you happen to have the original weights for trump to cage or whatnot?

Thank you

Regards

Adrian

Keros and Tensorflow version

AttributeError: module 'keras.utils.conv_utils' has no attribute 'normalize_data_format'

Got this error due to unsupported version of Keros and Tensorflow

My version in system is Keros=2.2.2 and Tensorflow=1.13.0

Can you tell me the correct version of both of these

Iteration issue during preprocessing

Hello

I'm porting your work over to Colaboratory, but having trouble with the preprocessing function during face extraction.
For some reason, I keep getting the error during the preprocessing process, and the loop terminates prematurely.

Some images do get preprocessed, so I'm not sure what's the problem. Using the Daniel Craig images in your data folder works fine.

"Failed to extract from image: ../content/chris/chris (80).jpg. Reason: list index out of range"

Code below, any advice would be helpful!

EDIT: So I took a closer look, it's definitely certain images that causes system trip. I believe it's due to the face_recognition library, name faces_filter.py. If no face is detected, and hence no encoding, then the resulting array will be null, and hence an exception is thrown. Will update here in due course

image

`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.