New AI System VLOGGER Creates Realistic Video Avatars from Photos and Audio
-
VLOGGER can generate realistic-looking video avatars of people from just a single photo, with accurate facial expressions and body movements matched to a speech sample.
-
VLOGGER combines recent AI advances like diffusion models and Transformers to predict and generate high-fidelity video frames synced to audio.
-
VLOGGER was trained on MENTOR, a large dataset of 2,200 hours of video showing 800,000 identities speaking.
-
VLOGGER allows controlling aspects like facial expressions and gestures in the generated videos.
-
Potential risks include generation of misleading deepfakes, though the ability to manipulate videos could also enable new applications in communication, education, assistants, etc.