New York CNN—
OpenAI has revealed a cutting-edge artificial intelligence device capable of replicating human voices with remarkable precision. This AI speech engine has diverse potential applications, from enhancing accessibility services to potentially raising concerns about misinformation and various forms of misuse.
The technology showcased by OpenAI involves a voice engine that can mimic a person’s speech patterns using a 15-second audio sample, generating a lifelike replica of their voice. Users can input a paragraph of text into the AI-generated speech interface, which will then articulate the content.
While there are already several AI-driven voice services available to the public, OpenAI has distinguished itself by driving the widespread adoption of AI technologies, as seen with the popular chatbot ChatGPT.
According to OpenAI, the text-to-voice tool powered by AI could assist in translation, aid in children’s reading, or support individuals who have lost their ability to speak. However, there are concerns among critics that such technology could facilitate fraudulent activities or increase deceptive practices.
Currently, only a select group of trusted partners in fields such as education and health technology have access to the Voice Engine technology. OpenAI plans to evaluate the results of these initial tests to determine the feasibility of broader usage. The company has imposed restrictions on the partners, requiring explicit consent before replicating voices and mandating clear disclosure to listeners that the audio is AI-generated.
In a published statement, OpenAI acknowledged the significant risks associated with creating lifelike voice replicas, particularly in sensitive contexts like elections. While Voice Engine is not yet available for public use, the company recognizes the necessity of implementing substantial safeguards as AI-generated audio becomes more prevalent. OpenAI proposed phasing out tone-based authentication methods for financial accounts as one potential measure.
OpenAI emphasized the importance of incorporating voice authentication mechanisms to verify that individuals willingly contribute their voices to the service. Additionally, the company suggested maintaining a “no-go voice list” to prevent the creation of voice replicas resembling prominent figures.
Voice Engine demonstrates the capability to create multilingual voice replicas from a single language sample. OpenAI showcased examples where a human voice reading a passage about friendship was accurately replicated in Spanish, Mandarin, German, French, and Asian languages while preserving the original tone and dialect.
Anticipation is high for the public launch of Sora, the AI-powered filmmaking tool teased by OpenAI as a glimpse into Voice Engine’s potential. Sora can generate 60-second videos featuring various characters, specific actions, and intricate background elements based on textual input. Additionally, OpenAI’s ChatGPT can swiftly generate images from text descriptions, showcasing the versatility of AI-driven content creation tools.