Stability AI has revealed a preview of its state-of-the-art Stable Diffusion 3.0, the next-generation flagship artificial intelligence (AI) model designed to transform text into images.
The company has consistently improved and launched various image models in the past year, each demonstrating higher levels of sophistication and quality. Following the release of SDXL in July, which notably enhanced the base model of Stable Diffusion, Stability AI is now pushing boundaries further.
The latest Stable Diffusion 3.0 model aims to provide enhanced image quality and superior performance in generating images from multi-subject prompts. It will also ensure significantly improved typography compared to previous versions, guaranteeing more precise and consistent spelling within the generated images. Typography has been a weak point for Stable Diffusion in the past, an aspect that competitors like DALL-E 3, Ideogram, and Midjourney have also been addressing in their recent releases. Stability AI is expanding the Stable Diffusion 3.0 lineup with models ranging from 800M to 8B parameters.
Not just an incremental update, Stable Diffusion 3.0 introduces a new architecture, distinguishing it from its predecessors.
VB Event
The AI Impact Tour – NYC
Join us in New York on February 29 in partnership with Microsoft for a discussion on balancing the risks and rewards of AI applications. Reserve your spot at this exclusive event below.
“Stable Diffusion 3 is a diffusion transformer, a novel architectural approach similar to the one used in the recent OpenAI Sora model,” disclosed Emad Mostaque, CEO of Stability AI, in an interview with VentureBeat. “It truly succeeds the original Stable Diffusion.”
Paving the way for a new era in image generation with diffusion transformers and flow matching
Stability AI has been exploring various approaches to image generation.
Recently, the company provided a preview of Stable Cascade, utilizing the Würstchen architecture to enhance performance and accuracy. In contrast, Stable Diffusion 3.0 adopts a different strategy by incorporating diffusion transformers.
“Unlike its predecessor, Stable Diffusion now integrates a transformer,” highlighted Mostaque.
Transformers have played a crucial role in the generative AI revolution, acting as the foundation for text generation models. On the other hand, image generation has predominantly relied on diffusion models. The concept of Diffusion Transformers (DiTs) introduces a new architecture that replaces the traditional U-Net backbone with a transformer working on latent image patches. This approach improves computational efficiency and can outperform other diffusion-based image generation techniques.
Another notable enhancement in Stable Diffusion is the inclusion of flow matching. The research paper on flow matching presents a novel technique for training Continuous Normalizing Flows (CNFs) to model complex data distributions. By utilizing Conditional Flow Matching (CFM) with optimal transport paths, the training process is accelerated, sampling becomes more efficient, and overall performance surpasses that of diffusion paths.
Credit: Stability AI (generated with Stable Diffusion 3.0)
Enhancing typographic capabilities in Stable Diffusion
The improved typography in Stable Diffusion 3.0 is a result of several enhancements integrated into the new model.
“This enhancement is credited to both the transformer architecture and additional text encoders,” explained Mostaque. “The model now supports complete sentences and coherent style.”
While the initial showcase of Stable Diffusion 3.0 focuses on text-to-image generative AI technology, it serves as the groundwork for broader applications. Stability AI is also working on capabilities for 3D image generation and video production in recent months.
“We develop open models that are versatile and adaptable to various needs,” emphasized Mostaque. “This model series covers different sizes and will steer the advancement of our next-generation visual models, including video, 3D, and more.”
VentureBeat’s mission is to act as a digital center for technical decision-makers to acquire insights into transformative enterprise technology and engage in transactions. Explore our Briefings.