Unconstrained Lip-synchronization
4 papers with code • 3 benchmarks • 3 datasets
Given a video of an arbitrary person, and an arbitrary driving speech, the task is to generate a lip-synced video that matches the given speech.
This task requires the approach to not be constrained by identity, voice, or language.
Most implemented papers
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
You said that?
To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames.
Towards Automatic Face-to-Face Translation
As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.
MARLIN: Masked Autoencoder for facial video Representation LearnINg
This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).