Google recently unveiled its latest AI chatbot model, Gemini 1.5 Pro, as an upgrade to the previously released Gemini model. Just two months after introducing the highly anticipated “Gemini” model, Google is now showcasing this “next-generation” AI model, boasting enhanced performance compared to its predecessor.
Gemini 1.5 Pro, according to Google, exhibits significant advancements across various aspects, outperforming the previous Gemini 1.0 Ultra while consuming less computational resources. The key enhancement lies in its capacity to process extensive files, whether they are text, video, or code. Unlike its predecessor, which was limited to handling 32,000 tokens or approximately 20,000 words per query, the new Gemini 1.5 Pro can efficiently process up to 1 million tokens per query.
This remarkable capability allows Gemini 1.5 Pro to analyze substantial amounts of information in a single operation, such as processing 1 hour of video, 11 hours of audio, codebases exceeding 30,000 lines of code, or over 700,000 words. Demis Hassabis, CEO of Google Deepmind, highlighted, “In our research, we’ve also successfully tested up to 10 million tokens.”
The advancements in Gemini’s functionality aim to enhance its utility further. During a demonstration, Google showcased how the new model can effectively search through a 402-page transcript of the Apollo 11 Moon landing to locate specific text excerpts. Additionally, in another demonstration, Gemini 1.5 Pro showcased its ability to visually analyze a 44-minute Buster Keaton film to identify specific moments within the footage.
By leveraging an AI modeling technique known as Mixture-of-Experts (MoE), Google optimized the Gemini model’s processing capabilities. Hassabis explained, “MoE models learn to selectively activate the most relevant expert pathways in its neural network based on the input type, significantly boosting the model’s efficiency.”
Despite its impressive performance, the Gemini 1.5 Pro model may exhibit slower response times when handling requests involving large files. Google’s demonstrations revealed that it could take up to a minute to generate responses, although these processing times may improve as the model undergoes further refinement.
While Google has not specified the official launch date for Gemini 1.5 Pro, the company plans to gradually grant access to developers and enterprise customers through Google’s AI Studio and Vertex AI services. Google also hinted at introducing pricing tiers for the model, starting from a standard 128,000 context window and scaling up to 1 million tokens as the model evolves. Early testers, however, can experiment with the million-token window at no cost for the time being.