Artificial intelligence (AI) video generators and the avatars they create are evolving quickly and UK-based AI video company Synthesia hopes to take the emerging technology to the next stage.
On Wednesday, the company announced its Expressive Avatars, which can depict a range of lifelike human emotions. The latest edition of what the company calls its “digital actors”, the Expressive Avatars feature enhanced facial expressions, more accurate lip sync, and realistically human-like voices — an upgrade from the robotic tone of most text-to-audio AI.
“This technology brings a level of sophistication and realism to digital avatars that blurs the line between the virtual and the real,” the company said in the announcement.
Synthesia’s text-to-video platform comes with more than 160 stock AI avatars users can choose from, which the company created based on human actors, with their consent and compensation. Teams can collaborate on videos from end to end and create videos in more than 130 languages.
The company aims to replace the entire video production process with their software — but they’re not coming for Hollywood, CEO Victor Riparbelli said during a demonstration of the release. Instead, the company focuses on enterprise and B2B content, where it sees a requirement for easy-to-create, engaging, and human-like video.
Synthesia’s Expressive Avatars are powered by its Express-1 AI model. While the company uses open-source LLMs for the text elements of the product, Express-1 was trained entirely on content Synthesia produced in-house — nothing synthetic or scraped from the web.
In the demo, Riparbelli explained that the company hired thousands of actors to record videos for the Express-1 model in their London and New York studios, in part to avoid importing biases embedded in existing datasets.
“With this particular technology, it’s not a viable strategy to go for synthetic content, because you essentially end up being able to replicate synthetic content, which is exactly what we’re trying not to do with this,” Riparbelli said. “You’re trying to replicate how humans actually speak.”
Riparbelli added that this relatively smaller dataset was enough for the Express-1 model because it is much more “narrow and specific” than models like OpenAI’s Sora or Runway.
The demo shows an avatar depicting three prompts: “I am happy”, “I am upset”, and “I am frustrated”. The avatar speaks with a more realistic and natural rhythm than previous generations of Synthesia’s tech.
“Expressive Avatars don’t just mimic human speech; they understand its context,” the announcement states. “Whether the conversation is cheerful or somber, our avatars adjust their performance accordingly, displaying a level of empathy and understanding that was once the sole domain of human actors.”
While not indistinguishable from real people, the lifelike nature of these avatars can be alarming — especially given how deepfake technology is abused.
“We are aware that Expressive Avatars are a powerful new technology, released during an important year for democracy, when billions of people around the world exercise their right to vote,” the company says in the announcement.
“We’ve taken additional steps to prevent the misuse of our platform, including updating our policies to restrict the type of content people can make, investing in the early detection of bad faith actors, increasing the teams that work on AI safety, and experimenting with content credentials technologies such as C2PA.”
The company also had protections in place before Wednesday’s release. Users can create custom avatars but must have the person’s explicit consent and go through a “thorough KYC-like procedure”, according to Synthesia’s website. Plus, you can opt out of the process at any time (as can the stock actors), and Synthesia will erase your data and likeness. The company doesn’t allow users to make avatars of celebrities or politicians under any circumstances.
In addition, Riparbelli explains in a video that Synthesia’s tools can only be used to create news content by vetted news organizations on enterprise plans. However, it’s unclear what criteria Synthesia is using, and whether the company fact-checks content created by its platform.
Synthesia is also part of the Content Authenticity Initiative, a coalition of companies and organizations working on tools for content provenance or for identifying the origins of a piece of media.
Synthesia believes the Expressive Avatars will help enterprises go beyond their basic content needs to create videos with a more empathetic touch: those about sensitive topics like healthcare, or customer support material that emulate the friendliness and patience of a real person.
“This is only the first release, the first product, you can say, that we’ve built on top of these models,” Riparbelli said during the demo. “I think we’re looking at a magnitude shift in capabilities within the next six to nine months.”