Google Unveils Gemini, a Family of Multimodal AI Models for Text, Images, Video and More
-
Gemini is a family of multimodal AI models trained on text, images, video, and other data to understand multiple modalities. It includes the flagship Gemini Ultra, Gemini Pro, and Gemini Nano models.
-
Gemini models aim to perform a wide range of tasks like summarizing text, generating images, translating languages, suggesting replies in messaging apps, and more. Their capabilities depend on where they are implemented.
-
Gemini Pro is currently available in Bard, Google's conversational AI service, as well as the Vertex AI platform and AI Studio for developers. The Pixel 8 Pro features Gemini Nano.
-
Gemini appears promising but is still in early stages, with limitations around accuracy and capabilities. Google has made big claims but underdelivered so far.
-
It remains unclear how Gemini stacks up to competitors like OpenAI's GPT-4, though Google claims benchmark wins. Pricing for some Gemini APIs starts at $0.0025 per 1000 characters.