Benefits of running AI Inference Ampere
In launching the first Cloud Native processors, Ampere has created a new CPU category in the Cloud and enterprise-class server markets. These processors benefit from a ground-up design based on cloud-native tasks eliminating many of the legacy hardware features of the x86 architecture while boosting performance, reducing unnecessary complexity, and reducing power consumption at the same time. Ampere® Altra® and Ampere Altra Max Cloud Native Processors based on the ARM v8.2 instruction set are being increasingly adopted by major Cloud Service Providers and outperform legacy CPUs both in terms of raw performance and performance/watt numbers. Ampere shows significant performance leadership on AI Inferencing workloads as well. This advantage stands not only against the legacy x86 architectures but also against an ARM v8 based processor family introduced by AWS. In what follows we will examine and compare the performance of Altra and Altra MAX with AWS’s Graviton 2 and Graviton 3.
Technical Features Comparison
Device | Altra | Altra Max | Graviton 2 | Graviton 3 |
---|---|---|---|---|
Process Node | 7nm | 7nm | 7nm | 5nm |
CPU Cores | 80 | 128 | 64 | 64 |
Fmax | 3.3GHz | 3.0GHz | 2.5GHz | 2.6GHz |
Architecture | ARM v8.2 | ARM v8.2 | ARM v8.2 | ARM v8.5 |
Micro - Architecture | Neoverse N1 | Neoverse N1 | Neoverse N1 | Neoverse N1 + 256b SVE |
L1 Cache | 64KB I - 64KB D | 64KB I - 64KB D | 64KB I - 64KB D | 64KB I - 64KB D |
L2 Cache | 1MB | 1MB | 1MB | 1MB |
L3 Cache | 32MB shared | 16MB shared | 32MB shared | 64MB shared |
Memory Channels | 8x DDR4-3200 | 8x DDR4-3200 | 8x DDR4-3200 | 8x DDR5-4800 |
Encryption | AES-256 | AES-256 | AES-256 | AES-256 |
PCIe | 128 x PCIe 4.0 | 128 x PCIe 4.0 | 64 x PCIe 4.0 | 32 x PCIe 5.0 |
Table 1: Ampere Altra and Altra MAX vs. Graviton 2 and Graviton 3 key features
Ampere Altra and Ampere Altra Max display some clearly superior performance advantages:
In addition, Ampere-optimized AI frameworks, TensorFlow, PyTorch and ONNX Runtime have additional speed up capability and take full advantage of the Ampere Altra family’s built-in hardware support for fp16 half precision data format. As a result, the Ampere Altra family deliver consistently superior performance compared to Graviton 2 and Graviton 3 in the majority of AI workloads. In this post we discuss the Ampere Altra family of processors’ benchmarks for computer vision and NLP models exemplifying our performance advantage over the Graviton family of processors.
In ResNet-50 v1.5 benchmarks we have measured latency and throughput performance of Altra Max performs and Graviton 2 and Graviton 3 (see Figures 1.1 and 1.2). All benchmarks were run using 64 cores in single threaded configuration. Latency tests used a batch size of one and the throughput tests a batch size of 64. Altra Max is 7x faster than Graviton 2 in latency with more than 2x in throughput. While Graviton 3 seems to have significantly improved its performance over Graviton 2, it still falls short against Altra Max that remains more than 3x faster in latency and more than 2x in throughput.
Figure 1.1: Altra Max latency vs.Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In latency smaller is better)
Figure 1.2: Altra Max throughput vs. Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In throughput larger is better)
In NLP workloads Altra MAX conserves its advantage in both latency and throughput. Graviton 3 shows an improved performance over Graviton 2, but still falls short of reaching Altra’s performance levels. Figures 2.1 and 2.2 summarize the BERT_large_MLPERF_Squad benchmark for the three devices. Altra MAX’s latency performance is 2.4x better than Graviton 2’s and 1.7x better than Graviton 3’s and its throughput is 1.7x higher than Graviton 2’s and 1.5x higher than Graviton 3’s.
The performance advantage described above is based on Altra’s fp32 mode. When used in fp16 mode—without any impact on accuracy—the performance gap further increases in the favor of Altra as can be observed in the figures 1 and 2.
Figure 2.1: Altra MAX relative latency vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In latency smaller is better)
Figure 2.2: Altra MAX relative throughput vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In throughput larger is better)
Access to Graviton instances is only possible through AWS. Ampere Altra is available on OCI (Oracle Cloud infrastructure), Microsoft Azure, Google Cloud, Tencent Cloud and Equinix and Hetzner offer bare-metal instances. The pricing of the A1 instances is about half of both Graviton 2 (c6g.xlarge) and Graviton 3 (c7g.xlarge). The A1 instance offers 24GB of memory for this price against 16GB for c6g and only 8GB for c7g. Given the performance advantages Ampere Altra and Altra MAX deliver over Graviton 2 and Graviton 3, they represent the obvious choice for AI inference workloads.
Furthermore, Table 3 lists the pricing options to build compute instances required to run the actual benchmarks or equivalent workloads at the performance levels shown in Figures 1 and 2.
Device | Cloud Service | Compute Instance | Configuration | $/CPU hour | Monthly Cost |
---|---|---|---|---|---|
Ampere Altra | OCI | A1 | 64 VCPUs 128 GB | $0.832 | $599.04 |
Graviton 2 | AWS | c6g.16xlarge | 64 VCPUs 128 GB | $2.176 | $1,000.76 |
Graviton 3 | AWS | c7g.16xlarge | 64VCPUs 128GB | $2.312 | $1,111.61 |
Table 2: Altra and Graviton 2 and Graviton 3 cost at 64 VCPU + 128GB configuration
Finally, with many different workloads and models are considered, Altra’s average price/performance ratios over Graviton 2 and Graviton 3 are compiled and shown in the end notes.
Device | Altra fp16 | Altra fp32 |
Over Graviton 2 | 10.2x | 6.5x |
Over Graviton 3 | 4.7x | 2.8x |
Ampere Altra and Altra Max CPUs are the clear leaders in performance and price when it comes to Cloud instances against AWS’s ARM based Graviton 2 and Graviton 3 compute instances by a wide margin.
All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.
©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.