It is very likely that, when surfing the Internet, we have all encountered a deepfake video at some time. Deepfakes often show well-known people doing really implausible things, like the Queen of England dancing on her table or Ron Swanson from Parks and Recreations starring in all the characters in the Full House series. These two examples of AI-generated videos are really easy to spot, and were never intended to be taken seriously in the first place. However, the technology to produce these types of videos is already widespread and anyone with enough interest and time can try to create one. This is where the subject gets serious and potentially dangerous. Until recently, it was very easy to spot an AI-created video by paying attention to one of the following clues:
lighting does not match the rest of the scene;
audio is out of sync;
parts are out of focus, mainly in the neck and hair areas;
there are areas of the skin that do not match the rest of the subject’s skin color;
But as AI models advance, these little glitches will no longer help us tell the real from the fake. But first, let’s find out how those videos are created.
How are deepfakes created?
Not long ago, we discussed the role of adversarial generative networks (GANs) in creating false images. In the case of deepfake videos, first an artificial neural network (ANN), called an autoencoder, analyzes the videos and photos of the subject at different angles and isolates the essential characteristics it discovers. From these characteristics, the ANN would be able to generate new images of the subject. But since we need to exchange the subject with another (or in this case, the face of the subject), we use another ANN to reconstruct our subject, an ANN trained with samples of the subject with whom we want to exchange the face. This other ANN then reconstructs the subject’s face, mimicking the behavior and speech patterns that the first ANN learned. Afterwards, a GAN looks for the flaws and improves and polishes the results until they are almost perfect.
And here’s the problem with deepfake detection: since deepfakes are created using adversarial training, the algorithm that creates the fakes will improve each time it is introduced into new detection systems. It is a fight that cannot be won, because adversary networks are designed to always improve.
Deepfake abuse and emerging problems
Like any great invention, the generation of images or artificial speeches can be a double-edged sword. Machine learning is getting better at everything it does and although right now, distinguishing the original from the work of the AIs can sometimes be very easy, the GANs are getting better and better and it is only a matter of time that there is no way to distinguish them just by looking or listening to them. We are talking about audio or video recordings that seem very genuine but are not.
Fraud cases have already been recorded in which AI-generated media has played an important role. An example is that of a company whose employee was scammed into transferring a considerable amount of money. He received a call in which what appeared to be his superior ordered him to do so. He also received an email confirming this transaction. But he didn’t know that the voice he was hearing was not his boss’s, but a very good imitation, generated by scammers.
Another example of misuse of AI and a growing problem is the creation of authentic-looking but faked pornography, in which a victim’s face is used to generate fake nude images. This includes both revenge porn and fake celebrity porn. The damage it can cause to victims is obvious.
In addition, there is the possibility of turning deepfakes into weapons on social networks, misinforming and manipulating users. Imagine a viral video of a politician saying things that he has never said and manipulating users into thinking that the recording is real.
Deepfakes also pose a potential threat to identity verification technology, as they could allow scammers to bypass biometric facial recognition systems.
For all this, deepfake detection software has gained great interest.
Deepfake detection
The problem with deepfake detection models
AI researchers are doing their best to develop algorithms that detect deepfake videos. But it is a technically demanding and difficult challenge. Some of the more interesting counterfeit detection models are
analysis of the blinking of the eyes: the generative models responsible for the creation of the videos need to be fed with some source data: images of the subject that they have to imitate. The images used by the deepfake models did not contain a large number of images representing people with their eyes closed, which led them to generate sequences in which the blinking patterns of the subjects were not natural.
remote heart rate estimation: this detection system focuses on trying to detect the subject’s heart rate, looking for subtle changes in skin color, in order to confirm the presence of blood under the skin.
Tracking individual small facial movements – This model is based on isolating distinctive facial expressions that are unique to each person. Next, compare whether these expressions are present in the subject’s evaluated video.
So far, it looks like we’re on our way to winning the war on deepfakes. But wait, there is a catch. As we have said before, the deep networks responsible for generating these false images can be trained to learn to avoid detection. This leads to a cat and mouse situation, where each time a new detection model is introduced, a better-trained deepfake generator appears soon after. A real example of this is the model that detected counterfeits by evaluating the blink patterns of the subjects’ eyes. Shortly after the article describing this detection model was published, the deepfake models fixed this bug.
Deepfake Detection Challenge
Until recently, large data sets or reference points were lacking to train detection models. And we say until recently, because thanks to the Deepfake Detection Challenge (DFDC) organized by Facebook together with other industry leaders and academics, a huge dataset of videos (more than 100,000) was shared publicly. Thanks to this data set, DFDC participants were able to train and test their detection models. More than 2,000 participants submitted more than 35,000 models for the contest. The results were announced last year, and the winning model achieved an accuracy of 65%. This means that 35% of the videos were marked as deepfakes even though they were not (a “false positive” error). Let’s face it, these figures aren’t too impressive …
DARPA SemaFor Program
DARPA, the US agency famous for developing innovative technologies, also decided to jump on the deepfake detection bandwagon by launching a program called SemaFor (Semantics Forensics). Its objective is to design a system that can automatically detect all types of manipulated media, combining three different types of algorithms: text analysis, audio analysis, and video content analysis. Its algorithms will be trained on 250,000 news articles and 250,000 social media posts, including 5,000 fake articles.
Microsoft Video Authenticator
In September 2020, tech giant Microsoft released a tool designed to help distinguish fake videos by providing a numerical probability – the confidence score – that the medium has been manipulated by an AI. It decided that the tool would not be released directly, because then deepfake creators could use its code to teach their models to evade detection.
Beyond deepfake detection
Every time a new media tamper detection method is released, it is only a matter of time before it is overtaken by a better and smarter counterfeiting algorithm. Therefore, to reduce the risks associated with the dissemination of counterfeit multimedia content, a more holistic approach is necessary. The solution seems to be in a combination of
media authentication – through the use of watermarks, fingerprints or signatures in the media metadata and the use of blockchain technologies;
the origin of the media – providing information on the origin of the media and reverse search of the same;
Conclution
The ability to detect false multimedia content is one of the main challenges we currently face in the world of technology. Ironically, every time a new detection model is released, there is an improvement in the fake generation models. In this way, we can expect to see much more credible and realistic deepfakes in the future. To combat the misuse of these media, additional measures such as authentication and provenance of the media need to be adapted.