The current emphasis is primarily on ChatGPT, Stable Diffusion, and DreamStudio-Generative AI, which is understandable given their consistently improving exceptional results. These intelligent assistants are reshaping how we approach tasks such as coding, network acquisition, and content creation, revolutionizing the search, analysis, and execution processes.
The impact of Gen AI extends to how businesses deliver IT services and how consumers achieve their goals. While the benefits are immense, so are the risks involved. Developing and deploying efficient AI solutions can be costly and uncertain. Additionally, Gen AI and the sophisticated language models (LLMs) powering it are computationally intensive, consuming significant amounts of energy. According to Dr. Sajjad Moazeni from the University of Washington, training an LLM with over 175 billion parameters can consume energy equivalent to that used by 1,000 US households in a year. Answering 100 million or more generative AI queries daily can consume one Gigawatt-hour of electricity, similar to the average energy consumption of 33,000 US households.
However, hyperscalers are facing challenges in managing such substantial energy requirements, posing a prohibitive cost for typical enterprises. The question arises: How can IT provide efficient and reliable AI solutions without incurring exorbitant power expenses and environmental impacts comparable to a small city?
Six Recommendations for Cost-Effective and Low-Risk Deployment of Gen AI
Harnessing conceptual AI for business applications requires the ability to customize it for specific tasks. Retraining enables the development of specialized models that are more compact, precise, and user-friendly. Does every organization need to establish a dedicated AI development team and infrastructure to train its own AI models? Not necessarily.
Here are six strategies for creating and leveraging AI solutions without overspending on high-end hardware or expert personnel.
1. Utilize Existing Models Instead of Starting from Scratch
While investing in proprietary models for unique applications is an option, many businesses and government entities lack the resources for supercomputing systems, high-performance computing (HPC) expertise, or data science talent. Initiating the AI journey with a foundational model that has an active developer community and a wide range of applications can be highly advantageous. Leveraging a pre-existing base model like ChatGPT from OpenAI or an open-source alternative such as Llama 2 from Meta can be beneficial. Platforms like Hugging Face offer various open-source models and applications to explore.
2. Customize the Solution to Fit the Model
Selecting the right model at the beginning of a project—whether open-source LLMs specialized in medical literature like Med-BERT, general-purpose models like GPT, or domain-specific models—can accelerate the process and save months of training time. However, caution is advised. The training data for any model may contain biases, and conceptual AI models can produce misleading or false outputs. Opt for models trained on transparent, clean data with explicit governance and understandable decision-making processes for maximum reliability.
3. Retrain for Improved Precision and Efficiency
Retraining base models on domain-specific data offers numerous advantages. As the model becomes more accurate within a narrower domain, it sheds unnecessary parameters, resulting in a more compact model tailored for the application. For example, retraining an LLM on financial data might sacrifice a generic skill like creative writing for the ability to assist clients with mortgage applications. The refined financial assistant would possess a streamlined model capable of delivering precise, top-notch services while running on standard hardware.
4. Optimize the Current Infrastructure
While equipping a microprocessor with thousands of GPUs may be challenging for most businesses, the majority of practical AI training, retraining, and inference tasks do not require extensive GPU arrays. Modern CPUs with integrated AI capabilities can handle training workloads of up to 10 billion parameters at competitive pricing and performance levels. Training these models on modern CPUs can be completed in minutes without the need for a GPU. Smaller models can operate on standalone edge devices with integrated CPUs, available at affordable to mid-range prices. CPUs offer rapid and accurate responses for models like Llama 2 with 20 billion parameters, which can rival GPUs in performance.
5. Implement Hardware-Optimized Solutions
Adapting inference applications to specific hardware configurations and features can significantly enhance efficiency. Similar to model training, optimization involves balancing precision, model size, and processing efficiency to meet the requirements of a particular application. For instance, reducing the precision loss of a 32-bit floating-point model by 4x can boost inference speeds. Leveraging hardware accelerators like integrated GPUs, Intel® Advanced Matrix Extensions (Intel® AMX), and IntelTM Advanced Vector Extension 512 (Intel® AVX-512), tools such as the Intel® Distribution of the OpenVINOTM toolkit aid in optimization and the development of hardware-aware inference engines.
6. Monitor Cloud Expenses
Delivering AI services through cloud-based APIs and solutions offers a rapid and reliable way to meet increasing demand. While continuous AI services from a service provider offer convenience, the associated costs can escalate quickly. Users tend to favor AI services they find valuable. Some organizations that initially embarked on their AI journey entirely in the cloud are transitioning towards workloads aligned with their existing on-premises or co-located infrastructure. Cloud-native enterprises with minimal on-premises infrastructure are exploring pay-as-you-go models and system-as-a-service to mitigate soaring cloud expenses.
The realm of Gen AI presents numerous possibilities beyond the reach of only the most financially endowed organizations that can afford relational AI. High-performance models, including LLMs for relational AI, are suitable for deployment on standard CPU-based data centers or edge devices. Tools for testing, prototyping, and deploying enterprise-grade generative AI solutions are evolving rapidly within open-source communities and cutting-edge platforms.
About Intel
Intel® hardware and software power AI training, inference, and applications in Dell microprocessors, data centers, high-performance servers for AI and IoT acceleration, and beyond. Learn more about Intel’s AI capabilities.
About Dell
Dell Technologies utilizes cutting-edge technologies, specialized services, and an extensive partner network to accelerate your AI journey from ideation to realization. Explore Dell’s offerings for AI solutions.