There is a scarcity of GPUs due to the increasing demand for generative AI, which heavily relies on GPUs for training and execution. Nvidia’s top-performing chips are reportedly out of stock until 2024. The CEO of TSMC, a chipmaker, has expressed concerns that the shortage of GPUs from Nvidia and its competitors may persist until 2025.
To reduce their dependency on GPUs, companies with the financial capacity, particularly tech giants, are developing and sometimes offering custom chips tailored for creating, refining, and deploying AI models. Amazon is one such company, as it recently introduced the latest iteration of its chips for model training and inferencing at the annual re:Invent conference.
The first chip, AWS Trainium2, is engineered to provide up to 4 times better performance and 2 times improved energy efficiency compared to its predecessor, Trainium, introduced in December 2020. It will be accessible in EC Trn2 instances in clusters of 16 chips in the AWS cloud, with the capability to scale up to 100,000 chips in AWS’ EC2 UltraCluster product.
According to Amazon, a cluster of 100,000 Trainium chips can process 65 exaflops of compute, translating to 650 teraflops per chip. This performance surpasses that of Google’s custom AI training chips from 2017. Amazon states that this cluster can train a 300-billion parameter AI language model in weeks rather than months, significantly enhancing efficiency.
The second chip unveiled by Amazon, the Arm-based Graviton4, is designed for inferencing tasks. It boasts up to 30% enhanced compute performance, 50% more cores, and 75% increased memory bandwidth compared to the previous-generation Graviton3 processor. Graviton4 also features encrypted physical hardware interfaces, enhancing security for AI training workloads and data with elevated encryption requirements.
Both Trainium2 and Graviton4 signify Amazon’s commitment to advancing cloud infrastructure to meet the evolving needs of customers. While Trainium2 availability details are yet to be disclosed, Graviton4 will be offered in Amazon EC2 R8g instances, currently in preview with general availability expected in the near future.