Google turbocharges its genAI engine with Gemini 1.5.

Google has swiftly introduced Gemini 1.5, the successor to its latest generative artificial intelligence (genAI) model, just a week after the initial release. Positioned as a multimodal AI model, Gemini 1.5 is now available for early testing, offering improvements over its predecessor in various aspects. Unlike OpenAI’s widely-used ChatGPT, Google emphasizes that its query engine can process a more extensive set of information, allowing users to receive more accurate responses.

Gemini models from Google stand out in the industry as native, multimodal large language models (LLMs). Both Gemini 1.0 and the latest Gemini 1.5 can consume and generate content through text, images, audio, video, and code prompts. This capability enables user prompts in formats like JPEG, WEBP, HEIC, or HEIF images, enhancing versatility. Google’s Gemini models compete with OpenAI’s efforts, such as the recently introduced Sora, a text-to-video model generating intricate video scenes based on user prompts.

While both OpenAI and Google recognize the significance of multi-modality, they approach it differently. Analysts note that Sora from OpenAI is currently a preview/limited availability model, not yet generally available. The competition between Google’s Gemini models and OpenAI’s offerings highlights the evolving landscape of AI capabilities.

Gemini 1.0, unveiled in December 2023, was recently released, representing Google’s reconstruction and rebranding of its Bard chatbot. Gemini exhibits flexibility, capable of running on various platforms, from data centers to mobile devices. Notably, Google’s AI cloud provider role is gaining prominence, with a comprehensive suite of over 132 available models for registered users.

As OpenAI works on its upcoming GPT-5, expected to be multimodal, analysts suggest it might consist of multiple smaller models stitched together. However, this approach may result in a less-efficient architecture compared to Google’s native multimodal offerings.

The initial release for early testing, Gemini 1.5 Pro, is described as a mid-size multimodal model optimized for scalability across a wide range of tasks. Performing at a similar level to the larger Gemini 1.0 Ultra, it requires significantly fewer GPU cycles. Gemini 1.5 Pro introduces an experimental long-context understanding feature, allowing developers to prompt the engine with up to 1 million context tokens. Developers can sign up for a Private Preview of Gemini 1.5 Pro through Google AI Studio, offering integration capabilities in 38 languages across over 180 countries and territories.