Stability AI’s Latest Update Allows Users to Generate Three-Minute Songs
By Emilia David, an AI reporter previously focused on technology, finance, and the economy.
Stability AI has unveiled Stable Audio 2.0, an enhanced audio generator that empowers users to craft three-minute musical compositions. This latest iteration introduces the capability for users to upload their audio samples, which can then be manipulated using prompts to create AI-generated songs. However, despite these advancements, the resulting compositions may not yet be Grammy-worthy.
The initial release of Stable Audio in September 2023 limited some paying users to 90-second clips, restricting their creative experimentation to short sound bites. With Stable Audio 2.0, users now have the freedom to craft full-length three-minute tracks—aligning more closely with the typical duration of radio-friendly songs. Notably, all uploaded audio must be free of copyright restrictions.
In contrast to OpenAI’s exclusive Voice Engine, available only to a select user base, Stability AI has made Stable Audio freely accessible to the public via its website and will soon offer it through an API.
A significant enhancement in Stable Audio 2.0 is its capacity to generate cohesive musical pieces with distinct components such as introductions, progressions, and conclusions, as highlighted by Stability AI.
During a demonstration of Stable Audio, I experimented with the platform using the prompt “folk pop song with American vibes” (intended to convey Americana). The resulting composition partially resonated with my “Mountain Vibes Listening Wednesday Morning Spotify playlist,” albeit with unexpected vocal elements that some likened to whale sounds. While amusing, it did evoke concerns of inadvertently summoning otherworldly entities into my home.
Users now have the flexibility to tailor their projects to personal preferences through new customization features in Stable Audio 2.0. These include adjusting prompt strength, determining the extent of audio modification, and incorporating sound effects like crowd roars or keyboard taps.
Despite advancements, AI-generated songs still exhibit a certain lack of depth and peculiarity, prompting reflections akin to those shared by my colleague Wes Davis following an experience with Suno’s compositions. Other tech giants like Meta and Google are also exploring AI audio generation but have refrained from public model releases as they address concerns regarding the artificial quality of the output.
Stability AI disclosed that Stable Audio is trained on data sourced from AudioSparx, boasting a vast library of over 800,000 audio files. The company assures that artists associated with AudioSparx had the option to exclude their content from the training dataset. Notably, Stability AI collaborated with Audible Magic for this update to leverage its content recognition technology in detecting and preventing the influx of copyrighted material on the platform.
While Stable Audio 2.0 represents a significant leap in mimicking traditional song structures, there is still room for improvement. Perhaps future iterations will refine vocal elements to offer more coherent lyrical content.