Written by zgiaonews• January 25, 2024• 5:33 pm• AI problems, AI Threat

### Revealing the Engine Behind Biden AI Robocall: Pindrop’s Analysis

HomeAI problems, AI Threat### Revealing the Engine Behind Biden AI Robocall: Pindrop’s Analysis

Explore Pindrop’s in-depth analysis of the 2024 Joe Biden deepfake robocall incident. Discove…

A phone call impersonating President Joe Biden has been circulated, marking a significant development in the 2024 US election cycle. Multiple news sources have determined that this incident represents a groundbreaking use of AI-generated deepfake audio targeting individuals across different US states. Despite the challenge of pinpointing the specific TTS website utilized, as highlighted by NBC News, our thorough investigation has revealed that ElevenLabs was the particular TTS system employed. By detecting unique algorithmic artifacts within the audio, we have provided insights into the functionality of deepfake detection systems. Learn more about how Pindrop’s real-time algorithmic detection enhances clarity by employing a distinctive ongoing scoring method to identify authenticity.

The 39-second audio clip underwent analysis on Pindrop’s algorithmic platform through a four-stage process that included sound filtration, feature extraction, segmentation into 155 parts of 250 milliseconds each, and continuous scoring.

To reduce wideband artifacts, the audio was downsampled to an 8 kHz sampling rate after removing non-speech frames such as silence, noise, and music. It is essential to replicate end-user listening conditions to ensure precise analysis, simulating standard phone channel settings. Our software extracts spectro-temporal features, processes them through a proprietary deep neural network, and generates a “fakeprint,” a concise scientific representation that preserves the distinctive features of machine-generated speech. These fakeprints play a crucial role in our liveness program, allowing us to identify the text-to-speech website responsible for creating the manipulated audio.

By leveraging our tested proprietary models on extensive datasets, which include information from 122 text-to-speech engines and other synthetic speech generation methods, our deepfake detection engine consistently assigns scores to each of the 155 segments.

Key insights obtained from our examination of this fabricated audio clip include:

Liveness Assessment

We assigned “liveness” scores to each segment using our advanced deepfake detection engine, with scores ranging from 0 (synthetic) to 1.0 (authentic). The liveness scores for the Biden robocall consistently indicated an artificial voice, dropping below the 0.3 threshold after the initial two seconds and remaining low throughout the call, definitively identifying it as algorithmic.

Real-Time Evaluation of President Biden’s Phone Audio

Public Disclosure of Synthesis Structure

Transparency is crucial in algorithmic recognition systems. By comparing President Biden’s audio with 122 commonly used AI systems for deepfakes using our fakeprints, Pindrop’s algorithmic diagnostic engine confidently identified ElevenLabs or similar synthesis systems as the probable source of this deepfake with a 99% probability. Our strict adherence to research standards ensured the absence of overfitting or bias issues. After confirming the usage of ElevenLabs in this case, we validated it using the SpeechAI Classifier from ElelfLab, indicating an 84% likelihood that this audio was generated using ElevenLabs. While ElevenLabs was utilized in this instance, future attacks may involve different Generative AI systems, underlining the necessity of robust safeguards against malicious exploitation. It is promising that platforms like ElevenLabs are integrating “deepfake classifiers” into their products, although obtaining explicit consent for voice cloning remains imperative.

Our investigation initially confirmed the use of a text-to-speech engine for the robocall before identifying the specific Synthesis system employed, ruling out mere voice impersonation techniques. Additionally, our study uncovered the specific words crucial in crafting the President’s cloned tone.

Identification of Fabricated Elements

As the call progressed, we monitored the prevalence of manipulated elements in each segment of the audio clip. Certain segments displayed a higher concentration of fake components, particularly in phrases like “Your voting makes a difference in November” or “New Hampshire national preference primary.” These segments, rich in consonants such as “preference” and “difference,” often act as distinctive markers of deepfakes. Furthermore, the presence of unfamiliar words in President Biden’s speech heightened the artificiality. For example, the phrase “If you would like to be removed from future calls, please click two now” exhibited significant manipulated artifacts, contrasting with the authentic tone of President Biden’s usual expressions, such as “What a bunch of malarkey.” Our Voice Similarity Check confirmed the consistent use of the same cloned tone throughout the call, devoid of any human voice.

Upholding Trust in Advertising and Public Information

In conclusion, the 2024 algorithmic phone incident involving Joe Biden emphasizes the critical importance of distinguishing between human and AI-generated voices. Pindrop’s cutting-edge methodologies, supported by text-to-speech technology, successfully identified and demonstrated the scalability of this deepfake. Organizations combatting algorithmic misinformation should prioritize factors like consistent content evaluation, adaptability to diverse acoustics, scientific transparency, linguistic coverage, and real-time efficacy when selecting detection solutions.

Visited 2 times, 1 visit(s) today

Last modified: January 26, 2024