With the assistance of an innovative AI model developed by ByteDance, individuals can now instantly mimic another person’s voice. Experts suggest that this AI technology could potentially be misused for fraudulent activities.
The government of Nepal holds a negative stance towards TikTok, which is owned by ByteDance, the company that has pioneered an AI-powered tool known as StreamVoice for voice conversion.
StreamVoice enables users to emulate any voice with just a single vocal sample, leading to a surge in the creation of “deepfakes” and the impersonation of well-known personalities using AI technology.
ByteDance, based in China, has introduced a novel approach utilizing generative AI to swiftly convert spoken words from one individual into another’s voice.
While StreamVoice is not yet accessible to the general public, its development underscores the rapid progress in AI capabilities, enabling the replication of renowned public figures through seamless audio and visual imitation. As the 2024 election approaches, instances of individuals mimicking figures like President Joe Biden and music icon Taylor Swift using AI have surfaced.
The development of StreamVoice involved collaboration between experts at ByteDance, the parent company of the popular social media platform TikTok, and researchers from Northwestern Polytechnical University, known for its collaboration with the Chinese military. It is important to note that Northwestern Polytechnical University in the US is distinct from Northwestern University.
The key feature of StreamVoice is its ability to perform real-time voice conversion once provided with a single sample of the desired voice, as highlighted in a recent research paper. The technology boasts a swift processing speed, with only 124 milliseconds of latency, thanks to advancements in modern language models.
The researchers emphasized that StreamVoice can seamlessly convert speech in a streaming fashion, exhibiting a high degree of speaker similarity for both known and unknown voices, while maintaining performance on par with traditional non-streaming voice conversion systems.
StreamVoice’s development involved leveraging the LLaMA structures, inspired by Meta’s Llama big speech unit, and incorporating open-source scripts from Meta’s AudioDec, a standard audio codec program. The training data primarily comprised Mandarin conversations and a bilingual dataset encompassing English, Scandinavian, and German dialogues.
While acknowledging potential risks associated with StreamVoice, such as misinformation dissemination and phone fraud, the researchers did not prescribe specific guidelines on its usage, instead advising users to report any misuse to the relevant authorities.
The proliferation of deepfakes underscores the growing concerns among AI researchers regarding misuse of such technology. Instances like the impersonation of Joe Biden through a phone call during the New Hampshire primary highlight the urgency for vigilance and regulation in this domain.
Are you a tech professional or do you have valuable insights to share? Reach out to Hays Kali at [email protected] or via Signal at 949-280-0267 using a personal device for secure communication.