Controlling facial gestures
In collaboration with colleagues at Stanford University and the Max Planck Institute for Informatics in Saarbrücken, FAU researchers have developed a new type of technology that allows the facial gestures and lip movements of one person to be transferred onto the video image of another – in real time. Justus Thies, a doctoral candidate at the Chair of Computer Science 9 (Computer Graphics), is the developer of this facial reenactment software.
FAU: Mr Thies, what does your software do?
Justus Thies: That’s easy to explain: our program recognises one person’s facial gestures and lip movements and transfers them onto the video image of another in real time. In the future this could be very useful for video conferences that use simultaneous interpreting, for example. It’s harder to understand when the movements of the speaker’s mouth don’t match the words that you hear. We merge the interpreter’s voice and facial expression with the speaker’s face. The speaker on the monitor looks like they’re speaking normally, with the visual and acoustic information matching up.
How exactly does it work?
First photographs are taken of the speaker’s face from three directions. This captures the shape of the face – the curvature of the nose and forehead, for example – as well as the texture, including features such as scars or birthmarks. Next a computer program is used to configure 80 parameters in such a way that the the 3D model fits onto the face like a kind of mask.
Just like for the face shape, there are also parameters – around 76 of them – that allow facial expressions to be captured. When the interpreter begins to speak, the differences between the two faces are calculated. The target face is then modified at a rate of several times per second so that it shows the same expression as the interpreter’s.
The idea of transferring the movements of a real person onto a different figure is not new. The technology has been used in films for years.
That’s true. The film industry uses this technique to bring avatars to life, for example. But until now this has always required a considerable amount of effort because they have to stick markers on the actors first in order to measure their movements. The computation times are also very long, even when high performance computers are used. We are the first to achieve this in real-time without additional facial markers.
So could your technology be used in the next animated films in Hollywood?
Our technology still has room for improvement. We are not yet able to transfer subtle gestures. Clear movements already work well. What is unique about our technology is that it works in real time.
Are there other areas that your software could be used in?
It seems logical that our software could be used not just for video conferences but also for films that are dubbed into another language. In dubbed films the lips movements often don’t match the text.
It could also be used in various areas of medicine. For example, the software could be used for psychological experiments or in exercise programmes for stroke patients with mild facial paralysis that would allow them to practice facial gestures.
Software that allows images to be changed could also be misused…
Of course the risk of image manipulation is high, but technology for editing photos and videos – with high quality results – has been around for a long time. Jan Böhmermann’s Varoufakis video showed this quite clearly. We hope that our publication and the Youtube video that goes with it will make people more aware of this issue. We are also aware of our responsibility as researchers. One of the upcoming doctoral research projects at our department will look at uncovering fake videos.
The researchers’ paper is available on the website of Stanford researcher Matthias Nießner.
Chair of Computer Science 9 (Computer Graphics)
Phone: +49 9131 8529924