UC Berkeley Surpasses Google AI With Open-Source Video Understanding Model
-
Google's new AI model Gemini 1.5 matches text descriptions to video frames, but its code is not public.
-
UC Berkeley researchers adapted an open-source model called Llama 2 to process text, images, and video like Gemini.
-
Their Large World Model (LWM) can answer "needle-in-a-haystack" video questions that stump Gemini 1.0 and GPT-4.
-
LWM was trained using a technique called Ring Attention that allows longer input sequences with less computing power.
-
The open-source code for LWM is available on GitHub, providing a foundation for future long-context models.