ieee | Artificial intelligence software could generate highly realistic fake
videos of former president Barack Obama using existing audio and video
clips of him, a new study [PDF] finds.
Such work could one day help generate digital models of a person for
virtual reality or augmented reality applications, researchers say.
Computer scientists at the University of Washington previously revealed they could generate digital doppelgängers
of anyone by analyzing images of them collected from the Internet, from
celebrities such as Tom Hanks and Arnold Schwarzenegger to public
figures such as George W. Bush and Barack Obama. Such work suggested it
could one day be relatively easy to create such models of anybody, when
there are untold numbers of digital photos of everyone on the Internet.
The researchers chose Obama for their latest work because there were
hours of high-definition video of him available online in the public
domain. The research team had a neural net analyze millions of frames of
video to determine how elements of Obama's face moved as he talked,
such as his lips and teeth and wrinkles around his mouth and chin.
In an artificial neural network, components known as artificial
neurons are fed data, and work together to solve a problem such as
identifying faces or recognizing speech. The neural net can then alter
the pattern of connections among those neurons to change the way they
interact, and the network tries solving the problem again. Over time,
the neural net learns which patterns are best at computing solutions, an
AI strategy that mimics the human brain.
In the new study, the neural net learned what mouth shapes were
linked to various sounds. The researchers took audio clips and dubbed
them over the original sound files of a video. They next took mouth
shapes that matched the new audio clips and grafted and blended them
onto the video. Essentially, the researchers synthesized videos where
Obama lip-synched words he said up to decades beforehand.
The researchers note that similar previous research involved
filming people saying sentences over and over again to map what mouth
shapes were linked to various sounds, which is expensive, tedious and
time-consuming. In contrast, this new work can learn from millions of
hours of video that already exist on the Internet or elsewhere.
0 comments:
Post a Comment