Stability AI
Stability AI introduced the latest addition to its lineup, Stable Diffusion 3, a cutting-edge open-weights model for image synthesis. This new model builds upon its predecessors, showcasing enhanced capabilities in generating detailed, multi-subject images while improving the accuracy of text generation. Although the unveiling lacked a public demonstration, Stability AI has initiated a waitlist today for interested individuals to experience it firsthand.
The Stable Diffusion 3 family comprises models ranging from 800 million to 8 billion parameters, catering to diverse device types—from smartphones to servers. The parameter size directly influences the model’s capacity to produce intricate details in its outputs. Moreover, larger models necessitate increased VRAM on GPU accelerators for seamless operation.
Since the onset of 2022, Stability AI has rolled out a series of AI image-generation models, including Stable Diffusion 1.4, 1.5, 2.0, 2.1, XL, XL Turbo, and now, 3. Distinguished for offering an open-source alternative to proprietary image-synthesis models like OpenAI’s DALL-E 3, Stability AI has faced scrutiny due to concerns regarding copyrighted training data, bias, and potential misuse, leading to ongoing legal disputes. Notably, Stable Diffusion models are open-weights and source-available, enabling local execution and customization to tailor outputs.
Stable Diffusion 3 Image Generation Examples:
- Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says “Stable Diffusion 3” made out of colorful energy. – Stability AI
- An AI-generated image of a grandma wearing a “Go big or go home sweatshirt” generated by Stable Diffusion 3. – Emad Mostaque (Stability AI)
- Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3. – André Kerygma (Stability AI)
- A horse balancing on top of a colorful ball in a field with green grass and a mountain in the background. – André Kerygma (Stability AI)
- Moody still life of assorted pumpkins. – André Kerygma (Stability AI)
- A painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words “stable diffusion.” – Stability AI
- Resting on the kitchen table is an embroidered cloth with the text ‘good night’ and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic. – André Kerygma (Stability AI)
- Photo of a 90’s desktop computer on a work desk, on the computer screen it says “welcome”. On the wall in the background we see beautiful graffiti with the text “SD3” very large on the wall. – André Kerygma (Stability AI)
Emad Mostaque, CEO of Stability AI, highlighted the technical advancements in Stable Diffusion 3, mentioning the utilization of a novel diffusion transformer akin to Sora, coupled with flow matching and other enhancements. This innovative approach harnesses transformer improvements to not only scale effectively but also accommodate multimodal inputs.
The Stable Diffusion 3 family leverages a diffusion transformer architecture, deviating from conventional image-building frameworks like U-Net architecture by processing image components individually. Inspired by transformers renowned for pattern recognition, this methodology ensures efficient scaling and the production of superior image quality.
Moreover, Stable Diffusion 3 incorporates “flow matching,” a method enabling AI models to seamlessly transition from random noise to structured images without simulating every step, focusing instead on the overall image creation trajectory.
While direct access to Stable Diffusion 3 remains restricted, evaluations based on samples available on Stability’s platforms indicate comparable performance to leading image-synthesis models such as DALL-E 3, Adobe Firefly, Imagine with Meta AI, Midjourney, and Google Imagen. The model excels in text generation, addressing a historical limitation of earlier iterations, marking a significant milestone in free model development. Notably, prompt fidelity aligns closely with DALL-E 3, although independent verification is pending.
Upon the completion of testing, Stability AI intends to release the model weights for free download and local execution. This preview phase, integral for refining performance and safety, underscores Stability’s commitment to enhancing user experience prior to the official rollout.
In recent explorations, Stability AI has introduced diverse image-synthesis architectures, including SDXL and SDXL Turbo, with the latest addition being Stable Cascade—a three-stage text-to-image synthesis process.
Image credits: Emad Mostaque (Stability AI)