Samsung Uses AI to Transform Still Facial Photos into Talking Video Sequences

May, 24, 2019 - 13:16
Space/Science news

TEHRAN (Tasnim) - Researchers from Samsung's AI Centre located in Moscow have created a new system that can transform still facial images into video sequences of the human face making speech expressions.

According to the study, the system creates realistic virtual talking heads through applying the facial landmarks of a target face onto a source face -- for example, a still photo -- to allow the target face to control how the source face moves, ZDnet reported.

"Such ability has practical applications for telepresence, including videoconferencing and multi-player games, as well as [the] special effects industry," Samsung said.

While the existence of "deepfake" technology isn't something new, Samsung's new system does not use 3D modelling and only requires one photograph to create a face model. If the system is able to use 32 images to create a model, the system will be able to "achieve [a] perfect realism and personalization score," Samsung said.

The ability to create a "deepfake" video with a limited amount of shots is due to the system having a large databank of talking head videos that correspond to different speakers with diverse appearance, according to Samsung. Through relying on this databank, in combination with the facial landmarks from the source face, the system is able to create various realistic-looking face models.

After that, the system uses generative adversarial network that compares the various face models against each other to determine which model is the most "real". By filtering through the various models created, the system is then able to choose a final model that is used for the video-sequence.

So-called "deep fake" videos are currently a major concern for US lawmakers, who are worried that AI-manipulated videos of people saying things they never did could become a national security threat.

In September, Facebook COO Sheryl Sandberg announced that it had created a machine-learning model to detect potentially bogus photos or videos to remove deep fake content from its platforms.