Srk2Cage using DeepFake
Pranab Sarkar
Jalpaiguri, West Bengal
- 0 Collaborators
Deepfake is a technique for human image synthesis based on artificial intelligence. It is used to combine and superimpose existing images and videos onto source images or videos using a deep neural network. ...learn more
Project status: Under Development
Groups
DeepLearning,
Artificial Intelligence Europe,
Artificial Intelligence West Coast,
Artificial Intelligence India,
Early Innovation for PC Skills
Intel Technologies
AI DevCloud / Xeon
Overview / Usage
Data Collection:
- Shahrukh Khan: https://www.youtube.com/watch?v=zRSjxp67Yzk&t=447s.
- Nicolas Cage: Collected his images from google images using a web crawler.
Methodology / Approach
- **Extraction: **Without hundreds (if not thousands!) of face pictures, we will not be able to create a deepfake video. A way to get around this is to collect a number of video clips which feature the people you want to face-swap. The extraction process refers to the process of extracting all frames from these video clips, identifying the faces and aligning them. The alignment is critical, since the neural network that performs the face swap requires all faces to have the same size (usually 256×256 pixels) and features aligned. Detecting and aligning faces is a problem that is considered mostly solved, and is done by most applications very efficiently.
- **Training: **It is important to notice that if we train two autoencoders separately, they will be incompatible with each other. The latent faces are based on specific features that each network has deemed meaningful during its training process. But if two autoencoders are trained separately on different faces, their latent spaces will represent different features. During the training phase, these two networks are treated separately. The Decoder A is only trained with faces of A; the Decoder B is only trained with faces of B. However, all latent faces are produced by the same Encoder. This means that the encoder itself has to identify common features in both faces. Because all faces share a similar structure, it is not unreasonable to expect the encoder to learn the concept of “face” itself.
- **Inference: **When the training process is complete, we can pass a latent face generated from Subject A to the Decoder B. As seen in the diagram below, the Decoder B will try to reconstruct Subject B, from the information relative to Subject A. If the network has generalised well enough what makes a face, the latent space will represent facial expressions and orientations. This means generating a face for Subject B with the same expression and orientation of Subject A.
Technologies Used
Tensorflow
Python
Repository
https://github.com/pranabsarkar/deep-fake-srk-cage/blob/master/faceswap.py