Google Gemini AI Video: Google recently showcased parts of the viral duck video created by Gemini, a competitor to GPT-4, in a demo video. The video titled “Hands-on with Gemini: Interacting with multimodal AI” was acknowledged by Google to have been edited to accelerate the presentation of outputs, as stated in the video description.
In this demo, there was no direct voice interaction observed between the human and the AI system. Instead of showcasing real-time responses from Gemini to drawings or changes in objects on the table, the demonstration utilized still images from the footage and text prompts. This approach aimed to create an impression of Gemini’s capabilities, albeit without providing clear disclaimers on how inputs were actually provided.
Oriol Vinyals, VP of Research & Deep Learning Lead at Google DeepMind, commented on the video saying, “Really happy to see the interest around our ‘Hands-on with Gemini’ video.” He further explained that Gemini was trained with sequences of different modalities such as images and text to predict subsequent actions. Vinyals also mentioned that developers would have the opportunity to experiment with similar techniques when access to Gemini Pro opens on 12⁄13.
Clarifying the authenticity of the demo, Vinyals emphasized that all user prompts and outputs in the video were genuine but condensed for brevity. The purpose of the video was to showcase the potential user experiences that could be developed using Gemini, aiming to inspire developers in their projects.
Looking ahead, Vinyals highlighted the flexibility of prompting Gemini with specific instructions to configure the model’s behavior, enabling engaging interactions similar to a dialogue. He expressed excitement about the progress made with Gemini Pro and anticipated the innovative applications developers would create using this technology.
The original viral video depicted the progressive creation of a duck illustration, culminating in a surprise reaction to a toy blue duck. Subsequent interactions included responding to voice queries about the toy, tracking a ball in a cup-switching game, recognizing shadow puppet gestures, and rearranging sketches of planets.
Notably, the original demo did not occur in real-time or through voice interactions. A Google spokesperson revealed that the video was crafted by utilizing still image frames and text prompts, showcasing a different approach to engaging with Gemini compared to the implied seamless voice conversations in real-time scenarios.
Google’s Duplex Demo: History of Controversy
This incident is not the first time Google’s demo videos have raised skepticism. Previously, Google faced scrutiny over the authenticity of its Duplex demo, where an AI assistant made reservations at various establishments.
During the Duplex demo, Google showcased the AI booking restaurant reservations, hair appointments, and travel arrangements. However, doubts emerged regarding the legitimacy of these tasks, with journalists and experts questioning the authenticity of the calls made by Google Duplex.
Concerns were raised due to background noises during the calls and other discrepancies, leading to speculations that the tasks performed by Google Duplex were staged rather than genuine interactions.