Claude 3 Opus, the cutting-edge artificial intelligence model developed by Anthropic, has seized the leading position on the Chatbot Arena leaderboard, surpassing OpenAI’s GPT-4 for the first time since its inception last year.
In contrast to traditional AI model evaluations, the LMSYS Chatbot Arena relies on human judgments, with individuals ranking the responses generated by two distinct models based on the same prompt without bias.
OpenAI’s various iterations of GPT-4 have long dominated the top rank, to the extent that any rival model achieving similar benchmark scores is dubbed a GPT-4-class model. Perhaps the time has come to introduce a new class of models akin to Claude-3 for future assessments.
It is important to highlight that the competition between Claude 3 Opus and GPT-4 is neck and neck, considering that the latter has been in existence for a year, with the anticipated arrival of the “markedly different” GPT-5 later this year — indicating that Anthropic’s reign at the top may be short-lived.
Understanding the Chatbot Arena
The Chatbot Arena, managed by LMSys (Large Model Systems Organization), hosts a diverse array of large language models engaging in anonymous randomized battles.
Launched in May last year, the arena has amassed over 400,000 user votes, with models from Anthropic, OpenAI, and Google consistently dominating the top positions.
Recent entries from French AI startup Mistral and Chinese firms like Alibaba have started to climb the ranks, alongside the growing presence of open-source models.
Stay informed with daily updates on major tech news, lifestyle tips, and expert analyses. Be the first to discover cutting-edge gadgets and exclusive deals.
Swipe horizontally to view rankings
RankModelEloVotes1Claude-3 Opus1253332501GPT-4-1106-Preview1251541411GPT-4-0125-preview1248348254Gemini Pro1203124764Claude-3 Sonnet1198327616GPT-4-03141185334997Claude-3 Haiku1179187768GPT-4-06131158518608Mistral-Large-24021157267349Qwen1.5-72B-Chat11482021110Claude-111462190810Mistral Medium114526196
The Elo rating system, commonly used in games like chess, is employed to determine the relative skill levels of the chatbot models, rather than the users interacting with them.
However, the arena has its limitations, including the exclusion of certain models or versions, occasional loading issues with GPT-4 models, and a tendency to favor models with real-time internet access such as Google Gemini Pro.
Notably, some prominent models like Google’s Gemini Pro 1.5 with its extensive context window and Gemini Ultra are absent from the arena.
Claude 3 Haiku: A Challenger to GPT-4
[Arena Update]70K+ new Arena votes🗳️ are in!Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market🔥Congrats @AnthropicAI on the incredible Claude-3 launch!More exciting… pic.twitter.com/p1Guuf0B3KMarch 26, 2024
The latest update, comprising over 70,000 new votes, witnessed Claude 3 Opus claiming the top position on the leaderboard, with even the smallest Claude 3 models delivering commendable performances.
According to LMSYS, “Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market.”
What sets Claude 3 Haiku apart is its efficiency despite being a “local size” model, akin to Google’s Gemini Nano. It achieves remarkable results without the massive scale of Opus or any GPT-4-class models.
Although not as sophisticated as Opus or Sonnet, Anthropic’s Haiku is notably cost-effective, faster, and, as indicated by the arena results, on par with much larger models in blind tests.
All three Claude 3 models secure positions in the top ten, with Opus leading the pack, Sonnet tied for fourth with Gemini Pro, and Haiku sharing sixth place with an earlier GPT-4 version.
Embracing Decentralized AI
Not going to beat centralized AI with more centralized AI.All in on #DecentralizedAI Lots more 🔜 https://t.co/SbEF5zoo05 March 23, 2024
The majority of the top 20 large language models on the arena leaderboard are proprietary, underscoring the challenge open-source initiatives face in competing with industry giants.
Meta, known for its emphasis on open-source AI, is set to unveil Llama 3 soon, a model expected to rival Claude 3 in capabilities, leveraging Meta’s vast resources including 300,000+ Nvidia H100 GPUs for training.
The landscape also sees advancements in open-source and decentralized AI, with StabilityAI’s founder Emad Mostaque shifting focus towards more distributed and accessible AI solutions. He advocates for decentralized AI, highlighting the limitations of centralized AI dominance.