According to recent research, a technique known as “distribution matching extraction” (DMD) has condensed a 100-stage process into a single step, resulting in a remarkable speed boost of up to 30 times for popular AI-powered image generators.
This innovative method enables newer AI models to imitate established image producers like DALL·E 3, Midjourney, and Stable Diffusion, enhancing their efficiency while maintaining image quality. The details of this breakthrough were shared in a research paper published on December 5, 2023, on the preprint platform arXiv.
Tianwei Yin, a graduate student in electronic engineering and computer science at MIT and co-lead author of the study, highlighted that their work accelerates existing diffusion models like Stable Diffusion and DALL·E 3 by a factor of 30. This acceleration not only reduces computational time significantly but also preserves or even surpasses the aesthetic quality of the generated images.
Traditionally, propagation models follow a multi-stage process to create images, leveraging training data such as descriptive text captions to enhance their understanding and responsiveness to textual prompts.
In practice, these models initiate the image creation process by encoding random noise into an initial image, a step known as “forward diffusion.” Subsequently, through up to 100 refinement steps called “reverse diffusion,” the noise is gradually cleared to produce a coherent image aligned with the input text.
By introducing their new framework, the researchers have streamlined this process by condensing the “reverse diffusion” steps into a single step, significantly reducing image generation time. For instance, their model decreased the time taken to generate an image from 2.59 seconds using Stable Diffusion v1.5 to a mere 90 milliseconds — a remarkable 28.8-fold acceleration.
The key components of DMD, namely “regression loss” and “distribution matching loss,” collaborate to expedite the learning process and align the generated images with real-world probabilities, thereby enhancing the model’s efficiency and output quality.
Fredo Durand, a professor of electrical engineering and computer science at MIT and co-lead author, emphasized that streamlining the iterative refinement process has long been a goal in diffusion models. The ability to generate single-step images represents a significant advancement that promises to reduce computational costs and accelerate image creation processes.
Yin further highlighted that the new method’s efficiency lies in its ability to achieve image generation in a single step, eliminating the need for the extensive iterative refinement stages present in traditional diffusion models. This breakthrough is poised to revolutionize content creation in sectors where rapid and efficient image generation is paramount.