NVIDIA announced that it’s acquiring Run:ai, an Israeli company that built a Kubernetes-based GPU orchestrator. While the price is not disclosed, there are reports that it is valued anywhere between $700 million and $1 billion.
The purchase of Run:ai shows Kubernetes’ growing significance in the age of relational AI. Kubernetes is now the de facto standard for controlling GPU-based accelerated computing systems.
Run:ai is a Tel Aviv, Israel-based AI infrastructure startup founded in 2018 by Omri Geller (CEO) and Dr. Ronen Dar (CTO). It has developed a resource-efficient pooling and sharing platform that was specifically designed for AI workloads running on GPUs. Tiger Global Management and Insight Partners led a $75 million Series C round in March 2022, bringing the company’s total funding to $118 million.
The Problem Run:AI Solves
GPUs cannot be easily virtualized, unlike CPUs. VMware’s vSphere and KVM enabled the emulation of several virtual CPUs from a single physical machine, giving workloads the illusion that they were running on a dedicated CPU. GPUs cannot be effectively shared among multiple machine learning tasks, such as training and inference. This presents a significant challenge for businesses that run GPU-based tasks in the cloud or on-site.
Containers and Kubernetes exacerbate the issue mentioned above. If a container isn’t fully utilized, the GPU will effectively idle. The scarcity of AI cards and GPUs only worsens the problem.
Run:ai saw an opportunity to efficiently solve this problem. Utilizing Kubernetes’ elements and proven scheduling techniques, they created a layer that enables businesses to utilize only a small portion of the available GPUs or combine several GPUs. This resulted in better GPU utilization, delivering better efficiency.
Below are 5 key features of Run:ai’s program:
- Automation and virtualization layer accommodating AI workloads running on GPUs and other chipsets. This makes it possible to efficiently pool and share GPU compute resources.
- Integration with Kubernetes for pod automation. Run:ai’s system is built on Kubernetes and supports all Kubernetes distributions. It also integrates with third-party AI frameworks and platforms.
- Centralized software for managing shared compute resources. Users can manage clusters, pool GPUs, and allocate processing power for different tasks through Run:ai’s software.
- Dynamic scheduling, GPU pooling, and GPU fractioning for optimum efficiency. Run:ai’s technology enables partitioning of GPUs and dynamic allocation of them to maximize efficiency.
- Integration with Nvidia’s AI stack includes DGX systems, Base Command, NGC containers, and AI Enterprise software. Run:ai has partnered closely with Nvidia to offer a comprehensive solution.
Importantly, Run:ai is not an open-source solution, even though it is based on Kubernetes. Customers are provided access to proprietary components that must be deployed in their Kubernetes clusters along with a SaaS-based management application.
Why did NVIDIA acquire Run:AI?
NVIDIA’s acquisition of Run:ai strategically positions the company to strengthen its position as a leader in the AI and machine learning industries, particularly in the context of increasing GPU utilization for these technologies. What are the main reasons for NVIDIA’s decision to acquire:
- Enhanced GPU Orchestration and Management: Run:ai’s advanced automation tools are critical for managing GPU resources more effectively. This ability is necessary as the demand for AI and machine learning solutions grows, necessitating more sophisticated management of hardware resources to ensure optimal performance and utilization.
- Integration with NVIDIA’s Existing AI Ecosystem: By acquiring Run:ai, NVIDIA can integrate this technology into its existing suite of AI and machine learning offerings. This expands NVIDIA’s overall product offerings, making it easier for customers who rely on NVIDIA’s ecosystem for their AI infrastructure needs to obtain comprehensive service. NVIDIA HGX, DGX, and DGX Cloud users may gain access to Run:ai’s capabilities for their AI workloads, particularly for relational AI workloads.
- Expansion of Market Reach: Run:ai’s established relationships with key players in the AI space, including their previous integration with NVIDIA’s systems, provide NVIDIA with an expanded market reach and the potential to serve a broader range of customers. This is particularly important in industries that are rapidly adopting AI technologies but face challenges with resource management and scalability.
- Innovation and Research Development: The acquisition enables NVIDIA to leverage the innovative capabilities of Run:ai’s team, known for their pioneering efforts in GPU virtualization and management. This could help NVIDIA stay ahead of AI technology and orchestration as further advancements are made in GPU technology and orchestration.
Effective GPU management becomes a competitive advantage as businesses increase their investments in AI and machine learning. With the acquisition of Run:ai by NVIDIA, it can stay ahead of other tech giants entering the AI hardware and orchestration market.
By acquiring Run:ai, NVIDIA strengthens its position as a market leader in the AI infrastructure market, keeping it ahead of the curve with market demands and technological advancements. This acquisition enhances NVIDIA’s product capabilities and solidifies its position as a leader in the AI infrastructure market.
What does the Cloud Native ecosystem and Kubernetes mean?
NVIDIA’s acquisition of Run:ai is significant for Kubernetes and cloud-native ecosystems for several reasons:
- Improved GPU Orchestration in Kubernetes: Integrating Run:ai’s advanced GPU management and virtualization capabilities into Kubernetes will enable more dynamic allocation and effective utilization of GPU resources across AI workloads. This aligns with Kubernetes’ capabilities in handling complex, resource-intensive applications, particularly in AI and machine learning, where efficient resource management is critical.
- Advancements in Cloud-Native AI Infrastructure: By leveraging Run:ai’s technology, NVIDIA can further enhance the Kubernetes ecosystem’s ability to support high-performance computing (HPC) and AI workloads. This synergy between NVIDIA’s GPU technology and Kubernetes will likely result in more robust solutions for deploying, managing, and scaling AI applications in cloud-native environments.
- Wider Adoption and Innovation: The acquisition could drive broader adoption of Kubernetes in industries that are increasingly reliant on AI, such as healthcare, automotive, and finance. In these industries, the ability to effectively manage GPU resources can accelerate the development and deployment cycles for AI models.
- Impact on Kubernetes Maturity: The integration of NVIDIA and Run:ai technologies with Kubernetes confirms that the platform is mature and ready to handle advanced AI workloads, making Kubernetes the de facto platform for modern AI and ML deployments. This may also encourage more businesses to adopt Kubernetes for their AI infrastructure needs.
NVIDIA’s acquisition of Run:ai strengthens its position in the AI and cloud computing fields, as well as expands its ability to support the Kubernetes ecosystem in enabling the next generation of AI applications, benefiting a variety of industries.