Google Unveils AI to Animate Photos from Speech, But Results Are Unrealistic
-
Google unveiled new AI called VLOGGER that animates still photos using speech recordings. However, the results fall into the uncanny valley and aren't very realistic.
-
VLOGGER works in two steps first predicting body motion and expressions from audio, then using an image diffusion model to generate the video frames.
-
VLOGGER outperforms prior methods on some metrics, but online commenters criticized the fake-looking results.
-
Possible use case is translating existing videos by editing the lip movements, but current quality is lacking.
-
Unclear if Google plans to productize this or if it was just a research project, as more work is needed to make it usable.