Written by 3:45 pm AI Music, Generative AI

### Can Stability AI’s Stable Audio 2 Outperform Suno 3’s ‘Mindblowing’ Music Generator?

Stable Audio offers longer high-quality AI generated tracks with solid licensing, but its compositi…

Stability AI, a prominent developer of artificial intelligence dedicated to the open-source philosophy, unveiled Stable Audio 2 this week, a novel audio and music generator. This marks a significant update since the initial release of Stable Audio in September, intensifying the competition with tools from companies like Suno, Google’s MusicFX, and Meta’s AudioCraft.

“Stable Audio 2.0 allows the creation of high-quality, complete tracks with a coherent musical structure lasting up to three minutes at 44.1 kHz stereo from a single natural language prompt,” Stability AI announced.

The launch comes at a challenging period for Stability, which reportedly faced financial strain before CEO Emad Mostaque stepped down two weeks ago.

Nevertheless, the company persists in advancing in the open-source AI domain. Alongside Stable Audio, Stability AI introduced a new coding LLM named Stable Code Instruct 3B on March 25 and unveiled an advanced open-source text-to-video generator named Stable Video Diffusion last year.

Furthermore, Stability AI is gearing up to introduce its most sophisticated image generator, Stable Diffusion 3, later this year.

Amidst open-source enthusiasts, Stability AI stands out alongside prominent names like Mistral and Nous. While other major tech firms are also delving into the open-source arena, Meta and Microsoft are notable contributors.


Introduction of Stable Audio 2.0

Stable Audio 2 is built on diffusion transformer technology (DiT), a departure from the previously utilized U-Net technology, akin to Stability AI’s upcoming Stable Diffusion 3 image generator.

DiT and U-Net are both prevalent architectures in machine learning, with DiT focusing on refining random noise into structured data incrementally, making it adept at handling lengthy data sequences. On the other hand, U-Net prioritizes accuracy for shorter generations but struggles with longer, more intricate sequences.

One of the key enhancements in Stable Audio 2 is audio-to-audio generation, a new functionality enabling users to alter sound samples they upload—similar to Stable Diffusion’s img2img for image manipulation.

Users can now upload audio samples and, through natural language prompts, transform them into various sounds, expanding sound effect generation and style transfer. This update grants artists and musicians increased flexibility, control, and an enhanced creative process.

Stable Audio 2 differs from its predecessor by modeling the initial audio file to align with the user’s prompt, rather than refining random noise. This results in a generation that adheres to the prompt while resembling the reference audio.

The model was exclusively trained on a licensed dataset from the AudioSparx music library. This ensures that all artists had the choice to opt out of the Stable Audio model training, respecting their rights and guaranteeing fair compensation.

Decryption_ evaluated the model and noted substantial improvements compared to Stable Audio 1.0. The generated music tracks exhibited enhanced coherence, with durations doubling the 90-second limit of the previous version.

Stable Audio 2’s prompting style mirrors that of Stable Diffusion 1.5, emphasizing tags or keywords over natural language prompts.

The model is best suited for providing inspiration or background music rather than replacing trained musicians for major songs. While it occasionally produced pleasant riffs, it also exhibited multiple hallucinations and discordant sounds diverging from the prompt.


Stable Audio 2 versus Suno 3

While Stable Audio 2 showcases notable advancements over its predecessor, it falls short in comparison to the sounds and songs generated by Suno 3, a recent update to the leading audio generator. Many AI enthusiasts hail Suno 3 as the top model in the AI music domain, with accolades like “mindblowing” and a “game changer.”

A comparison between Stable Audio 2 and Suno 3 using the same prompts revealed Suno 3’s superiority in generating audio quality. Suno 3’s output typically features proper song structures with natural riffs, choruses, bridges, and variations, offering a more complete musical experience compared to Stable Audio 2.

Moreover, Suno 3 transitions smoothly between different parts of the song, enhancing the listening experience, unlike Stable Audio 2, which often exhibits abrupt transitions in music generations.

Suno 3 also outpaces Stable Audio 2 in terms of audio generation speed, a crucial factor for users requiring quick and efficient audio production.

However, Stable Audio 2 boasts a unique capability that Suno 3 lacks: audio-to-audio generations. This feature allows users to transform sound samples uploaded, providing a level of control not yet available in Suno.

Both Stable Audio and Suno present powerful options, particularly for individuals with a passion for music creation but lacking musical expertise. However, Stable Audio may need further advancements in its upcoming versions to match Suno’s level of audio generation quality.


Edited by Ryan Ozawa.

Visited 4 times, 1 visit(s) today
Tags: , Last modified: April 4, 2024
Close Search Window
Close