Enhancing HPC and AI Capabilities: Addressing the Power Challenge
At the recent SC23 computation seminar in Denver, Daniel Reed, a professor at the University of Utah, highlighted the necessity of expanding hardware resources to achieve higher performance levels in the realm of computing. This requirement for increased hardware capacity results in the development of larger systems, heightened energy consumption, and a surge in cooling demands.
The current landscape of computation showcases a trend towards larger clusters, with the Top500’s most substantial computation clusters consuming over 20 megawatts of power. Particularly noteworthy are server campuses tailored to support AI training and inference, which often surpass this power consumption threshold. Projections suggest that by 2027, a single capability-class computer could necessitate a staggering 120 megawatts of electricity.
During a discussion on carbon-neutrality and sustainability in high-performance technology, experts from the University of Chicago, Schneider Electric, Los Alamos National Laboratory, Hewlett Packard Enterprise, and the Scandinavian IT Center for Science deliberated on these evolving trends. They also shared insights on strategic planning, deployment, and reporting practices for these advancing services.
Water Costs and Power Efficiency
The discourse at the seminar underscored the significance of Power Usage Effectiveness (PUE) as a key metric in evaluating data center efficiency. PUE compares the power consumption of computing, storage, and networking equipment to the overall utilization, with lower values indicating higher efficiency. While PUE serves as a valuable tool for optimizing data center operations, it has also led to certain detrimental practices among hyperscalers and large data center operators.
Some hyperscalers have established massive data centers in arid regions like Arizona and New Mexico, leveraging evaporative cooling systems to achieve impressive PUE figures. However, this approach raises concerns about the substantial water consumption associated with evaporative cooling methods.
Genna Waldvogel from Los Alamos highlighted innovative water management practices at their facility, emphasizing the recycling of water used in server cooling systems. This sustainable approach involves treating wastewater and reusing it in the cooling process, minimizing water wastage.
Strategic Site Selection and Environmental Impact
Nicolas Dubé from HPE emphasized the critical role of location in data center siting decisions. By opting for regions abundant in clean energy sources, such as hydropower and wind, the environmental footprint of AI infrastructure can be significantly reduced. Dubé cited the example of a Quebec-based infrastructure facility powered predominantly by renewable energy sources, showcasing the potential for environmental mitigation through strategic site selection.
Furthermore, Dubé proposed harnessing waste heat generated by data centers for practical purposes, such as heating adjacent agricultural greenhouses. This integrated approach not only maximizes resource utilization but also contributes to sustainable practices within the community.
Dynamic Operation for Sustainability
The CERES Center for Unstoppable Computing at the University of Chicago, under Andrew Chien’s leadership, advocates for dynamic operation strategies to enhance data center sustainability. By adjusting the utilization of HPC clusters based on available power resources, operators can optimize efficiency and reduce carbon emissions, particularly during periods of peak renewable energy generation.
Chien’s vision for the “Fugaku Next” project in Japan anticipates substantial reductions in power costs and carbon emissions through dynamic operational practices. This adaptive approach aligns with the evolving landscape of green energy integration into global power grids.
Advancing Reporting Practices
Robert Bunger from Schneider Electric stressed the importance of comprehensive monitoring and reporting to mitigate the environmental impact of expanding HPC and AI clusters. He recommended a proactive approach by infrastructure operators to track key sustainability metrics, including water use efficiency, renewable energy consumption, and overall power efficiency.
To streamline reporting efforts, Schneider proposed 28 indicators for infrastructure operators to monitor, encompassing various aspects of sustainability and operational efficiency. While acknowledging the challenges of tracking all metrics, Bunger advised starting with a manageable subset and progressively expanding monitoring capabilities to drive continuous improvement in environmental stewardship.