Comparison of Ampere Altra Family to AWS Graviton 2 and 3
in AI Inference Models
In launching the first Cloud Native processors, Ampere has created a new CPU category in the Cloud and enterprise-class server markets. These processors benefit from a ground-up design based on cloud-native tasks eliminating many of the legacy hardware features of the x86 architecture while boosting performance, reducing unnecessary complexity, and reducing power consumption at the same time. Ampere® Altra® and Ampere Altra Max Cloud Native Processors based on the ARM v8.2 instruction set are being increasingly adopted by major Cloud Service Providers and outperform legacy CPUs both in terms of raw performance and performance/watt numbers. Ampere shows significant performance leadership on AI Inferencing workloads as well. This advantage stands not only against the legacy x86 architectures but also against an ARM v8 based processor family introduced by AWS. In what follows we will examine and compare the performance of Altra and Altra MAX with AWS’s Graviton 2 and Graviton 3.
Table Ampere Altra and Ampere Altra Max display some clearly superior performance advantages:
In addition, Ampere-optimized AI frameworks, TensorFlow, PyTorch and ONNX Runtime have additional speed up capability and take full advantage of the Ampere Altra family’s built-in hardware support for fp16 half precision data format. As a result, the Ampere Altra family deliver consistently superior performance compared to Graviton 2 and Graviton 3 in the majority of AI workloads. In this post we discuss the Ampere Altra family of processors’ benchmarks for computer vision and NLP models exemplifying our performance advantage over the Graviton family of processors.
In ResNet-50 v1.5 benchmarks we have measured latency and throughput performance of Altra Max performs and Graviton 2 and Graviton 3 (see Figures 1.1 and 1.2). All benchmarks were run using 64 cores in single threaded configuration. Latency tests used a batch size of one and the throughput tests a batch size of 64. Altra Max is 7x faster than Graviton 2 in latency with more than 2x in throughput. While Graviton 3 seems to have significantly improved its performance over Graviton 2, it still falls short against Altra Max that remains more than 3x faster in latency and more than 2x in throughput.
Figure 1.1: Altra Max latency vs. Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In latency smaller is better)
Figure 1.2: Altra Max throughput vs. Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In throughput larger is better)
In NLP workloads Altra MAX conserves its advantage in both latency and throughput. Graviton 3 shows an improved performance over Graviton 2, but still falls short of reaching Altra’s performance levels. Figures 2.1 and 2.2 summarize the BERT_large_MLPERF_Squad benchmark for the three devices. Altra MAX’s latency performance is 2.4x better than Graviton 2’s and 1.7x better than Graviton 3’s and its throughput is 1.7x higher than Graviton 2’s and 1.5x higher than Graviton 3’s.
The performance advantage described above is based on Altra’s fp32 mode. When used in fp16 mode—without any impact on accuracy—the performance gap further increases in the favor of Altra as can be observed in the figures 1 and 2.
Figure 2.1: Altra MAX relative latency vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In latency smaller is better) Figure 2.2: Altra MAX relative throughput vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In throughput larger is better)
Access to Graviton instances is only possible through AWS. Ampere Altra is available on OCI (Oracle Cloud infrastructure), Microsoft Azure, Google Cloud, Tencent Cloud and Equinix and Hetzner offer bare-metal instances. The pricing of the A1 instances is about half of both Graviton 2 (c6g.xlarge) and Graviton 3 (c7g.xlarge). The A1 instance offers 24GB of memory for this price against 16GB for c6g and only 8GB for c7g. Given the performance advantages Ampere Altra and Altra MAX deliver over Graviton 2 and Graviton 3, they represent the obvious choice for AI inference workloads.
Furthermore, Table 3 lists the pricing options to build compute instances required to run the actual benchmarks or equivalent workloads at the performance levels shown in Figures 1 and 2.
Table 2: Altra and Graviton 2 and Graviton 3 cost at 64 VCPU + 128GB configuration
Finally, with many different workloads and models are considered, Altra’s average price/performance ratios over Graviton 2 and Graviton 3 are compiled and shown in the end notes.
Table 3: Altra’s composite price/performance advantage over Graviton 2 and Graviton 3 (Using compute configurations shown in Table 3) table
The composite price/performance numbers were based on the benchmark results of 12 industry standard computer vision and NLP models tested in single stream latency and offline throughput using Ampere Altra, Graviton 2 and Graviton 3. The models used for the benchmarks are shown in Table 4:
Table Table 4: Models used in Altra vs. Graviton Benchmarks
The hardware platforms along with the TensorFlow versions used in the benchmarks are shown in Table 5:
Table Table 5: Hardware platforms and Software versions used in the benchmarks.
Given the availability of multiple versions of TensorFlow for Graviton 3, only the best performance results were used in the final benchmark report.
4655 Great America Parkway
Suite 601 Santa Clara, CA 95054