On a series of benchmarks assessing AI model capabilities, Gemini Ultra is showcasing exceptional performance when pitted against OpenAI’s GPT-4.
Recently introduced by Google, Gemini emerges as a formidable rival to OpenAI’s GPT-4, boasting three distinct versions of varying sizes and capacities. Google’s announcement highlighted Gemini Ultra as its most advanced model, surpassing GPT-4 in various domains such as historical and legal knowledge, Python code generation, and multi-step reasoning tasks. Despite its remarkable achievements, Gemini Ultra remains inaccessible to the general public at present.
During the evaluation on the Massive Multitask Language Understanding exam (MMLU), a widely recognized tool for gauging AI’s information processing and problem-solving skills, Google asserted Gemini’s superiority over GPT-4.
Describing the MMLU as akin to the SATs for AI models, tech journalist Kevin Roose mentioned on The New York Times’ Hard Fork podcast that the exam delves deeper than standard college prep assessments. Google’s official statement outlined the MMLU’s coverage of 57 diverse topics, including mathematics, science, history, law, medicine, and ethics, to evaluate global information processing and problem-solving proficiencies.
Google reported that Gemini Ultra achieved a remarkable 90% score on the MMLU, surpassing GPT-4’s 86.4% performance. Notably, Gemini Ultra’s standout feat lies in outperforming human experts in the MMLU assessment, with professional specialists scoring approximately 89.8%.
Roose speculated that achieving a 90% score on the MMLU could be indicative of Artificial General Intelligence (AGI), a theoretical form of AI capable of advanced cognitive functions like consciousness and common sense. Google’s Gemini design, specifically engineered to excel in processing various data types encompassing text, music, scripts, images, and videos, sets it apart from conventional models, as highlighted by Oriol Vinyals, Google’s DeepMind Research VP.
Google asserts that Gemini’s multimodal design, tailored to handle diverse data types seamlessly, grants it a competitive edge over existing models. The potential of Gemini to outperform GPT-4 purely based on computational prowess has been speculated by researchers at SemiAnalysis.
While Gemini Pro, a less advanced variant accessible via Google’s robot Bard, has garnered positive initial feedback, concerns regarding hallucinations, reliability issues, and guidance to prefer Google for contentious queries have also surfaced.
Inquiries directed to Google and OpenAI by Business Insider regarding this development remain unanswered.