New AI System VLOGGER Creates Realistic Video Avatars from Photos and Audio

VLOGGER can generate realistic-looking video avatars of people from just a single photo, with accurate facial expressions and body movements matched to a speech sample.
VLOGGER combines recent AI advances like diffusion models and Transformers to predict and generate high-fidelity video frames synced to audio.
VLOGGER was trained on MENTOR, a large dataset of 2,200 hours of video showing 800,000 identities speaking.
VLOGGER allows controlling aspects like facial expressions and gestures in the generated videos.
Potential risks include generation of misleading deepfakes, though the ability to manipulate videos could also enable new applications in communication, education, assistants, etc.