Written by 10:28 am AI, Discussions, Latest news

### The AI Wars Intensify with Claude 3’s “Near-Human” Abilities

Willison: “No model has beaten GPT-4 on a range of widely used benchmarks like this.”

On Monday, Anthropic unveiled Claude 3, a trio of AI language models akin to those powering ChatGPT. Anthropic asserts that these models establish new benchmarks in various cognitive tasks, even reaching “near-human” proficiency in certain instances. The models are currently accessible on Anthropic’s website, with the most potent variant requiring a subscription. Additionally, they can be accessed via API for developers.

The three iterations of Claude 3—namely, Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus—progress in complexity and parameter count. While Sonnet drives the Claude.ai chatbot for free with an email sign-up, Opus is exclusively accessible through Anthropic’s web chat interface with a monthly fee of $20 for the “Claude Pro” subscription service. All three models boast a 200,000-token context window, defining the amount of tokens processed by the AI language model at a given time.

Anthropic had previously introduced Claude in March 2023 and Claude 2 in July of the preceding year. On both occasions, Anthropic slightly lagged behind OpenAI’s top models in performance but excelled in context window length. With Claude 3, Anthropic seems to have narrowed the performance gap with OpenAI’s released models, although expert consensus is yet to be reached, given the subjective nature of AI benchmarks.

Claude 3 purportedly showcases advanced capabilities across diverse cognitive tasks such as reasoning, expert knowledge, mathematics, and linguistic proficiency. Anthropic claims that the Opus model, the most advanced of the trio, demonstrates “near-human” levels of understanding and fluency in intricate tasks.

The pricing and performance spectrum of the Claude 3 models exhibit significant variation compared to their predecessors and competitor models. Anthropic highlights the increased speed and cost-effectiveness of the models, with Opus priced at \(15 per million input tokens and \)75 per million output tokens, Sonnet at \(3 per million input tokens and \)15 per million output tokens, and Haiku at \(0.25 per million input tokens and \)1.25 per million output tokens. In contrast, OpenAI’s GPT-4 Turbo via API is priced at \(10 per million input tokens and \)30 per million output tokens, while GPT-3.5 Turbo is priced at \(0.50 per million input tokens and \)1.50 per million output tokens.

Anthropic plans to roll out regular updates to the Claude 3 model family in the near future, introducing new functionalities like tool use, interactive coding, and advanced agentic capabilities. The company affirms its commitment to maintaining safety measures in tandem with advancements in AI performance, ensuring minimal potential for catastrophic risks associated with the Claude 3 models.

In conclusion, the landscape of AI benchmarks is intricate and subject to variability based on prompts and underlying model conditioning. It is imperative for users to assess and test different models to determine their suitability for specific applications, considering the nuanced strengths and weaknesses inherent in each AI assistant.

Visited 2 times, 1 visit(s) today
Tags: , , Last modified: March 5, 2024
Close Search Window
Close