On Tuesday, Stability AI introduced Stable Diffusion XL Turbo, an advanced AI image-synthesis model designed to swiftly generate images based on textual prompts. This latest model boasts real-time image generation capabilities, enabling rapid transformation of source images, such as webcam feeds.
The key innovation of SDXL Turbo lies in its capacity to generate image outputs in a single step, a substantial improvement from the 20–50 steps required by its predecessor. This efficiency leap is attributed to Stability’s proprietary technique known as Adversarial Diffusion Distillation (ADD). By leveraging score distillation and adversarial loss mechanisms, ADD enhances the model’s ability to distinguish between real and generated images, thereby enhancing the realism of the generated outputs.
Stability elucidated the intricate workings of the model in a research paper released on Tuesday, emphasizing the significance of the ADD technique. Notably, SDXL Turbo exhibits similarities to Generative Adversarial Networks (GANs), particularly in its ability to produce single-step image outputs.
Although the images produced by SDXL Turbo may lack the intricate details of those generated by SDXL at higher step counts, it is not positioned as a direct replacement for the previous model. However, the time-saving benefits it offers are remarkable.
In a practical test, SDXL Turbo was locally executed on an Nvidia RTX 3060 using Automatic1111, showcasing its ability to generate a 1024×1024 image in just 4 seconds for a 3-step process, as opposed to 26.4 seconds required for a 20-step SDXL image with comparable detail. Notably, smaller images can be generated even faster (under one second for 512×768). Utilizing more powerful graphics cards like the RTX 3090 or 4090 can further expedite the image generation process. Our findings suggest that SDXL Turbo images exhibit optimal detail at approximately 3–5 steps per image.
The “real-time” attribute of SDXL Turbo stems from its remarkable generation speed. Stability AI claims that on an Nvidia A100 GPU, the model can produce a 512×512 image in just 207 milliseconds, encompassing encoding, a single de-noising step, and decoding. Such rapid speeds hold the potential for real-time generative AI video filters or innovative video game graphics generation, pending resolution of coherency challenges related to maintaining subject consistency across frames or generations.
Currently, SDXL Turbo is accessible under a non-commercial research license, restricting its usage to personal and non-commercial endeavors. While this decision has sparked some backlash within the Stable Diffusion community, Stability AI remains open to exploring commercial applications and encourages interested parties to reach out for further details.
In parallel, Stability AI has encountered internal management turmoil, with an investor recently urging CEO Emad Mostaque to step down. Despite these challenges, Stability continues its momentum of product releases. Just last week, the company unveiled Stable Video Diffusion, a tool capable of transforming static images into brief video clips.
For those interested in exploring the capabilities of SDXL Turbo, Stability AI offers a beta demonstration on its image-editing platform, Clipdrop. Additionally, an unofficial live demo is available on Hugging Face for experimentation at no cost. It is essential to note the inherent caveats associated with these demos, including uncertainties regarding training data sources and the potential for misuse. Nevertheless, the relentless technological advancements in AI image synthesis underscore a rapidly evolving landscape in this domain.