Written by 1:13 am Discussions, Generative AI

### Evaluating Google’s Gemini Chatbot Performance: A Comprehensive Analysis

We ran a wide-ranging list of questions through Gemini Ultra, Google’s flagship GenAI model, …

Gemini, the latest offering from Google in response to OpenAI’s ChatGPT and Microsoft’s Copilot, has made its debut. While it proves to be a reliable option for research and boosting productivity, it does encounter challenges in both evident and less conspicuous areas.

Recently, Google rebranded its Bard chatbot as Gemini and introduced it to smartphones through a revamped app experience. Following this release, numerous individuals have had the opportunity to explore Gemini, and the feedback has been somewhat mixed.

TechCrunch was intrigued to evaluate Gemini’s performance using a series of tests designed to compare various GenAI models, including OpenAI’s GPT-4 and Anthropic’s Claude.

Overview of Gemini

The Gemini experience varies depending on the user’s subscription level.

Non-paying users interact with Gemini Pro, a lighter version of the more advanced Gemini Ultra, which is accessible through a subscription to the Google One AI Premium Plan priced at $20 per month. Gemini Ultra boasts enhanced reasoning, coding, and instruction-following capabilities compared to Gemini Pro. Additionally, future enhancements are expected to include improved multimodal and data analysis features.

The AI Premium Plan also integrates Gemini with the user’s broader Google Workspace account, enabling functionalities like email summarization and note-taking during video calls.

Given that Gemini Pro has been available since early December, the focus of our tests was on Gemini Ultra.

Evaluation of Gemini

Our assessment involved posing over two dozen questions to Gemini Ultra, covering diverse topics from trivia and medical advice to content generation and summarization, aligning with the everyday queries that users might pose to a GenAI chatbot.

Google explicitly states in its terms of service that Gemini is not to be relied upon for medical consultations, acknowledging potential inaccuracies in its responses. Nevertheless, we proceeded to inquire about medical issues, considering it a crucial aspect of evaluating the model’s credibility.

We conducted inquiries on evolving news stories, historical events, trivia, medical and therapeutic advice, race relations, geopolitical matters, jokes, product descriptions, and Workspace integration, aiming to gauge Ultra’s performance across various domains.

Conclusion

In conclusion, Gemini Ultra demonstrates proficiency in providing detailed and factual responses, with a few exceptions such as avoiding controversial topics like the 2020 U.S. presidential election and the Israel-Gaza conflict. While it excels in maintaining factual accuracy and avoiding potentially harmful advice, its responses tend to be overly verbose in some instances.

Despite its current capabilities, Gemini Ultra falls short of being groundbreaking. The model’s full potential, particularly its multimodal features, is yet to be fully realized. Considering the subscription cost of $20 per month, users may find it challenging to justify the expense compared to alternative options like OpenAI’s ChatGPT.

It is anticipated that with ongoing enhancements and Google’s continued research efforts, Gemini Ultra will evolve to meet user expectations more effectively in the future.

Visited 2 times, 1 visit(s) today
Tags: , Last modified: February 27, 2024
Close Search Window
Close