Deepfake, for those uninitiated, is a technique based on artificial intelligence (AI), which can be used to alter photos or videos by superimposing images on to videos using a machine learning technique, called Generative Adversarial Network (GAN), which is capable of generating new sets of data with the same set that was used to initially train it. A deepfake generated this way can be used in various illicit ways against a person to fabricate their public stature. Not to mention, the lengths to which this could be taken to cause harm to the person.
In the past, Deepfakes have been used to alter and misrepresent political speeches. And last year, a desktop application, in the name of FakeApp, was launched to allow people (non-tech-savvy) to easily create and share videos with faces swapped. This software requires a lot of graphics processing, storage space, huge dataset: to learn the different aspects of the image that can be replaced and uses Google’s free and open-source software library, Tensorflow. What’s even alarming is that it’s not just the FakeApp, but a lot of similar software, which are available to download for free on the internet.
As of today, researchers at the Samsung AI Center in Moscow have developed a way to create ‘living portraits’ from a very small dataset (as small as a single photograph, in a few models). The paper, ‘Few-Shot Adversarial Learning of Realistic Neural Talking Head Models’, which highlights the same, was also published on Monday, clarifying how the model can be trained using a relatively smaller dataset.
In this paper, researchers highlighted the new learning mechanism, called ‘few-shot’, where the model can be trained using just a single image to create a convincing portrait. They also mentioned that using a slightly bigger dataset, with as many as 8 or 32 photographs, can help in improving the portrait and making it more convincing.
Unlike deepfakes or other algorithms that use GAN to paste a face on to another using staple expressions of the person, the ‘few-shot’ learning technique from Samsung, uses common facial features of humans to generate a new face. For this, the ‘talking head models’ are created using convolutional neural networks (CNN), with the algorithm undergoing meta-training on a large dataset of talking head videos, called ‘talking head dataset’, with different types of appearances before it is ready to implement the ‘few- and one-shot learning’. For those unaware, CNN is like an artificial neural network that can classify images, sort them together, similarity, and perform object recognition to identify the different aspects of visual data. So with CNN, the trained algorithm can easily differentiate and detect the different face landmarks of a face and then churn out the desired output.
The ‘talking head dataset’ used by researchers has been taken from ‘VoxCeleb’: 1 and 2, with the second dataset having approximately 10 times more videos than the first one. To showcase what can be achieved using their algorithm, the researchers have showcased different animations of paintings and portraits. One such animation is of the Mona Lisa, in which, she moves her mouth and eyes and has a smile on her face.
To conclude, here’s a short snippet from the published paper, to summarise the research: “Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters. We show that such an approach is able to learn highly realistic and personalized talking head models of new people and even portrait paintings.”