For over a year, the anticipation was high for Amazon Web Services to introduce its Graviton4 processor for its proprietary servers at this year’s re:Invent event. During the keynote address, CEO Adam Selipsky unveiled the fourth iteration in the Graviton CPU series, marking the fifth release inclusive of last year’s overclocked Graviton3E processor tailored for HPC workloads.
In a surprising turn of events, Selipsky did not showcase the Graviton4 chip during his keynote presentation, deviating from the norm. However, the press release did feature an image of the chip.
As predicted, the Graviton4 is constructed on Arm Ltd’s “Demeter” Neoverse V2 core, which aligns with the Armv9 architecture shared by Nvidia’s “Grace” CG100 CPU. Notably, the V2 core boasts a 13 percent enhancement in instructions per clock compared to its predecessor, the “Zeus” V1 core utilized in the prior Graviton3 and Graviton3E processors already in use by AWS.
Despite the modest increase in IPC, AWS has augmented the core counts significantly, implying a transition from the 5-nanometer process employed in the fabrication of the Graviton3 and Graviton3E chips to a denser 4-nanometer process from a leading foundry. This advanced process is also leveraged in the production of Nvidia’s Grace CPU and “Hopper” GH100 GPUs, both making waves in the generative AI realm.
The Graviton4 package integrates 96 V2 cores, marking a 50 percent surge over its predecessors. Moreover, it incorporates twelve DDR5 controllers, surpassing the eight DDR5 memory controllers used in the earlier versions, with a 16.7 percent boost in DDR5 memory speed to 5.6 GHz. Consequently, the Graviton4 delivers 536.7 GB/sec of memory bandwidth per socket, a 75 percent improvement over the prior models.
According to AWS’s disclosures, generic web applications exhibit a 30 percent performance enhancement on the Graviton4 compared to the Graviton3, while databases and large Java applications showcase speed boosts of up to 40 percent and 45 percent, respectively. This performance leap could potentially be attributed to the doubling of the L2 cache per core to 2 MB, rather than the implementation of simultaneous multithreading (SMT) in the V2 cores.
The introduction of the new R8g instances utilizing the Graviton4 chip raises curiosity, especially with claims of offering significantly larger instance sizes with triple the vCPUs and memory capacity of the current R7g instances.
In summary, the Graviton4 chip signifies a notable advancement in the Graviton series, characterized by enhanced performance metrics, increased core counts, and improved memory bandwidth, all within a power-efficient design. As more details surface, the narrative surrounding the Graviton4’s capabilities is expected to evolve further.