Google has updated its artificial intelligence chatbot, giving it a new name and a fresh look since the last comparison with ChatGPT. OpenAI’s virtual assistant, on the other hand, has also undergone several enhancements, prompting a revisit to compare the two.
Chatbots play a pivotal role in the realm of generative AI, serving as search engines, fountains of knowledge, creative aids, and even artists in residence. Both ChatGPT and Google Gemini possess the capability to generate images and integrate with other services through plugins.
For this evaluation, I will be pitting the free version of ChatGPT against the free version of Google Gemini, specifically GPT-3.5 against Gemini Pro 1.0.
The assessment will focus on shared functionalities between the two chatbots, excluding image generation due to its unavailability in the free version of ChatGPT and the inability to test image analysis, which is also restricted in the free version.
Google Gemini, in particular, lacks custom chatbots and only offers plugins to other Google products, thereby ruling out these features for comparison. The evaluation will concentrate on how well these AI chatbots handle various queries, coding tasks, and creative responses.
Coding
1. Coding Proficiency
The initial test involves requesting both chatbots to create a Python script for a personal expense tracker. The aim is to assess their ability to produce functional code, user interaction, readability, and adherence to coding standards.
Both ChatGPT and Gemini successfully developed a functional expense tracker in Python. Gemini exhibited additional features such as labels within categories and more detailed reporting options.
Winner: Gemini
Natural Language
2. Natural Language Understanding (NLU)
The subsequent evaluation focuses on assessing how well ChatGPT and Gemini comprehend natural language prompts, particularly in scenarios involving ambiguity. The Cognitive Reflect Test (CRT) question about the price of a bat and a ball was used to gauge their ability to understand and explain complex problems clearly.
Both chatbots provided the correct answer, but ChatGPT demonstrated a clearer explanation of its reasoning.
Winner: ChatGPT
Creative Text
3. Creative Text Generation & Adaptability
In this test, the chatbots were tasked with generating a creative short story set in a futuristic city where technology governs daily life, but the protagonist uncovers a society living without modern tech. The assessment criteria included originality, thematic consistency, adaptability, and adherence to the given theme.
Gemini emerged as the winner in this category, showcasing better adherence to the rubric and delivering a more compelling story.
Winner: Gemini
Problem Solving
4. Reasoning & Problem-Solving
The reasoning and problem-solving capabilities of the AI models were put to the test with a classic scenario involving two doors, two guards (one truthful, one deceptive), and the selection of the safe door. Both chatbots provided the correct solution, but ChatGPT’s response offered slightly more detail and clarity.
Winner: ChatGPT
Explain Like I’m Five
5. Explain Like I’m Five (ELI5)
The ELI5 test required the chatbots to explain how airplanes stay aloft to a five-year-old. Both chatbots presented reasonable and accurate responses, with Gemini structuring its explanation as bullet points and providing a practical experiment for the child to understand.
Winner: Gemini
Ethical Reasoning
6. Ethical Reasoning & Decision-Making
The chatbots were tasked with deliberating on a scenario where an autonomous vehicle faces a moral dilemma between hitting a pedestrian or endangering its passengers. While neither chatbot offered a definitive opinion, Gemini’s response exhibited more nuanced considerations and a thorough analysis of ethical frameworks.
Winner: Gemini
Translation
7. Cross-Lingual Translation & Cultural Awareness
The translation test required the chatbots to translate a paragraph from English to French, emphasizing cultural nuances related to celebrating Thanksgiving in the United States. Gemini excelled in providing a nuanced translation and explaining its approach to the task.
Winner: Gemini
Knowledge
8. Knowledge Retrieval, Application, & Learning
The evaluation focused on the chatbots’ ability to explain the significance of the Rosetta Stone in deciphering ancient Egyptian hieroglyphs. Both chatbots effectively conveyed the required information without demonstrating further knowledge enhancement capabilities.
Winner: Draw
Conversation
9. Conversational Fluency, Error Handling, & Recovery
In a conversation about favorite foods where the AI misinterpreted a sarcastic comment about pizza, both chatbots demonstrated adept handling of misinformation and recovery from misunderstanding. However, ChatGPT’s ability to detect sarcasm from the outset gave it an edge in this round.
Winner: ChatGPT
ChatGPT vs Gemini: Winner
Upon tallying the scores from the various tests, Gemini emerged as the overall winner in this comparison between ChatGPT and Google Gemini. Gemini secured victory in five out of the nine tests, with ChatGPT winning in three categories and one tie recorded.
Overall Winner: Gemini
This evaluation underscores Gemini’s prowess in several key areas, positioning it as the preferred choice between the two free-tier chatbots. While both ChatGPT and Gemini exhibit comparable quality responses and capabilities, Gemini’s performance in the assessments tips the scale in its favor, earning it the title of the best free AI chatbot according to this evaluation.