Talking Head Generation
40 papers with code • 7 benchmarks • 3 datasets
Talking head generation is the task of generating a talking face from a set of images of a person.
( Image credit: Few-Shot Adversarial Learning of Realistic Neural Talking Head Models )
Most implemented papers
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
In order to create a personalized talking head model, these works require training on a large dataset of images of a single person.
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
MakeItTalk: Speaker-Aware Talking-Head Animation
We present a method that generates expressive talking heads from a single facial image with audio as the only input.
ReenactGAN: Learning to Reenact Faces via Boundary Transfer
A transformer is subsequently used to adapt the boundary of source face to the boundary of target face.
Text-based Editing of Talking-head Video
To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.
Neural Voice Puppetry: Audio-driven Facial Reenactment
Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head.
0-Step Capturability, Motion Decomposition and Global Feedback Control of the 3D Variable Height-Inverted Pendulum
We also prove that the 3D VHIP with Fixed CoP is the same as its 2D version, and we generalize controllers working on the 2D VHIP to the 3D VHIP.
What comprises a good talking-head video generation?: A Survey and Benchmark
In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
Talking-head Generation with Rhythmic Head Motion
When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information.
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars
The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views.