Alphabet’s Gemini AI model, which has only been public for a short two months, is already receiving an upgrade. The new Gemini Pro 1.5, set for limited release today, boasts enhanced capabilities compared to its predecessor, enabling it to process large volumes of text, video, or audio simultaneously.
Developed by Google DeepMind, led by CEO Demis Hassabis, the new model is likened to a person’s working memory in terms of its immense input capacity. This comparison draws from Hassabis’ background as a neuroscientist, exploring the concept of memory capacity. According to Hassabis, the model’s core capabilities unlock a myriad of possibilities for ancillary tasks.
During a demonstration, Google DeepMind showcased Gemini Pro 1.5’s prowess by analyzing a 402-page PDF containing the Apollo 11 communications transcript. The model successfully identified humorous segments, such as astronauts attributing a communications delay to a sandwich break. Additionally, the model adeptly answered questions about specific actions in a Buster Keaton movie. These tasks, which were challenging for the previous Gemini version, showcase the model’s expanded capabilities, encouraging developers to innovate new applications leveraging its potential.
Oriol Vinyals, a research scientist at Google DeepMind, expressed amazement at the model’s ability to reason across extensive content effortlessly. The capabilities of Gemini Pro 1.5 include processing an hour of video, 11 hours of audio, 700,000 words, or 30,000 lines of code in a single instance—outperforming other AI models like OpenAI’s GPT-4. Google has not disclosed the technical specifics behind this achievement but envisions applications such as extracting key insights from lengthy Discord discussions as a practical use case.
Moreover, Gemini Pro 1.5 showcases superior performance compared to its size, excelling in various benchmarks. Leveraging a technique known as mixture of experts, the model optimizes its architecture to efficiently tackle specific tasks, enhancing training and operational efficiency without escalating computational demands.
Despite its smaller size, Gemini Pro 1.5 rivals the capabilities of Gemini Ultra in numerous tasks. The same enhancement technique applied to Gemini Pro could potentially elevate the performance of Gemini Ultra, according to Hassabis.
Developers can access the upgraded Gemini Pro through AI Studio for testing model functionalities and a select group via Google’s Vertex AI cloud platform API. Google plans to introduce new tools to facilitate Gemini integration into applications, including features to leverage the model’s video and audio parsing abilities within Project IDX, a web-based coding tool.
The rapid evolution of Gemini reflects the intense competition in the AI landscape, catalyzed by the success of ChatGPT. Recent advancements by OpenAI and Google underscore the industry’s dynamic nature, with innovations like long-term memory for chatbots and premium subscriptions for advanced models reshaping the AI ecosystem.
While progress in generative AI accelerates, concerns about associated risks persist. Google emphasizes rigorous testing and feedback mechanisms for Gemini Pro 1.5, highlighting the collaboration with AI Safety Institute researchers to assess potential risks and ensure responsible development.
Hassabis hints at forthcoming advancements, signaling a shift towards a more agile development approach akin to a startup mentality. This proactive stance sets the stage for continuous innovation and evolution in the AI domain.