Solutions with Ampere Cloud Native Processors

AI - Ampere Altra Family Vs. Graviton

Benefits of running AI Inference Ampere

Introduction

In launching the first Cloud Native processors, Ampere has created a new CPU category in the Cloud and enterprise-class server markets. These processors benefit from a ground-up design based on cloud-native tasks eliminating many of the legacy hardware features of the x86 architecture while boosting performance, reducing unnecessary complexity, and reducing power consumption at the same time. Ampere^® Altra^® and Ampere Altra Max Cloud Native Processors based on the ARM v8.2 instruction set are being increasingly adopted by major Cloud Service Providers and outperform legacy CPUs both in terms of raw performance and performance/watt numbers. Ampere shows significant performance leadership on AI Inferencing workloads as well. This advantage stands not only against the legacy x86 architectures but also against an ARM v8 based processor family introduced by AWS. In what follows we will examine and compare the performance of Altra and Altra MAX with AWS’s Graviton 2 and Graviton 3.

Technical Features Comparison

Device	Altra	Altra Max	Graviton 2	Graviton 3
Process Node	7nm	7nm	7nm	5nm
CPU Cores	80	128	64	64
Fmax	3.3GHz	3.0GHz	2.5GHz	2.6GHz
Architecture	ARM v8.2	ARM v8.2	ARM v8.2	ARM v8.5
Micro - Architecture	Neoverse N1	Neoverse N1	Neoverse N1	Neoverse N1 + 256b SVE
L1 Cache	64KB I - 64KB D	64KB I - 64KB D	64KB I - 64KB D	64KB I - 64KB D
L2 Cache	1MB	1MB	1MB	1MB
L3 Cache	32MB shared	16MB shared	32MB shared	64MB shared
Memory Channels	8x DDR4-3200	8x DDR4-3200	8x DDR4-3200	8x DDR5-4800
Encryption	AES-256	AES-256	AES-256	AES-256
PCIe	128 x PCIe 4.0	128 x PCIe 4.0	64 x PCIe 4.0	32 x PCIe 5.0

Table 1: Ampere Altra and Altra MAX vs. Graviton 2 and Graviton 3 key features

Ampere Altra and Ampere Altra Max display some clearly superior performance advantages:

Up to 2x higher number of CPU cores. Ampere offers up to 2x compute capacity on the same device.
Higher CPU core speeds by up to 20% with no visible power penalty
Lower or similar silicon cost at twice the compute capacity

In addition, Ampere-optimized AI frameworks, TensorFlow, PyTorch and ONNX Runtime have additional speed up capability and take full advantage of the Ampere Altra family’s built-in hardware support for fp16 half precision data format. As a result, the Ampere Altra family deliver consistently superior performance compared to Graviton 2 and Graviton 3 in the majority of AI workloads. In this post we discuss the Ampere Altra family of processors’ benchmarks for computer vision and NLP models exemplifying our performance advantage over the Graviton family of processors.

In ResNet-50 v1.5 benchmarks we have measured latency and throughput performance of Altra Max performs and Graviton 2 and Graviton 3 (see Figures 1.1 and 1.2). All benchmarks were run using 64 cores in single threaded configuration. Latency tests used a batch size of one and the throughput tests a batch size of 64. Altra Max is 7x faster than Graviton 2 in latency with more than 2x in throughput. While Graviton 3 seems to have significantly improved its performance over Graviton 2, it still falls short against Altra Max that remains more than 3x faster in latency and more than 2x in throughput.

ResNet-50 v1.5

Figure 1.1: Altra Max latency vs.Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In latency smaller is better)

ResNet-50 v1.5

Figure 1.2: Altra Max throughput vs. Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In throughput larger is better)

In NLP workloads Altra MAX conserves its advantage in both latency and throughput. Graviton 3 shows an improved performance over Graviton 2, but still falls short of reaching Altra’s performance levels. Figures 2.1 and 2.2 summarize the BERT_large_MLPERF_Squad benchmark for the three devices. Altra MAX’s latency performance is 2.4x better than Graviton 2’s and 1.7x better than Graviton 3’s and its throughput is 1.7x higher than Graviton 2’s and 1.5x higher than Graviton 3’s.

The performance advantage described above is based on Altra’s fp32 mode. When used in fp16 mode—without any impact on accuracy—the performance gap further increases in the favor of Altra as can be observed in the figures 1 and 2.

BERT_large_MLPERF_Squad

Figure 2.1: Altra MAX relative latency vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In latency smaller is better)

BERT_large_MLPERF_Squad

Figure 2.2: Altra MAX relative throughput vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In throughput larger is better)

Cost/Performance Comparison

Access to Graviton instances is only possible through AWS. Ampere Altra is available on OCI (Oracle Cloud infrastructure), Microsoft Azure, Google Cloud, Tencent Cloud and Equinix and Hetzner offer bare-metal instances. The pricing of the A1 instances is about half of both Graviton 2 (c6g.xlarge) and Graviton 3 (c7g.xlarge). The A1 instance offers 24GB of memory for this price against 16GB for c6g and only 8GB for c7g. Given the performance advantages Ampere Altra and Altra MAX deliver over Graviton 2 and Graviton 3, they represent the obvious choice for AI inference workloads.

Furthermore, Table 3 lists the pricing options to build compute instances required to run the actual benchmarks or equivalent workloads at the performance levels shown in Figures 1 and 2.

Device	Cloud Service	Compute Instance	Configuration	$/CPU hour	Monthly Cost
Ampere Altra	OCI	A1	64 VCPUs 128 GB	$0.832	$599.04
Graviton 2	AWS	c6g.16xlarge	64 VCPUs 128 GB	$2.176	$1,000.76
Graviton 3	AWS	c7g.16xlarge	64VCPUs 128GB	$2.312	$1,111.61

Table 2: Altra and Graviton 2 and Graviton 3 cost at 64 VCPU + 128GB configuration

Finally, with many different workloads and models are considered, Altra’s average price/performance ratios over Graviton 2 and Graviton 3 are compiled and shown in the end notes.

Device	Altra fp16	Altra fp32
Over Graviton 2	10.2x	6.5x
Over Graviton 3	4.7x	2.8x

Table 3: Altra’s composite price/performance advantage over Graviton 2 and Graviton 3 (Using compute configurations shown in Table 3)

Conclusions

Ampere Altra and Altra Max CPUs are the clear leaders in performance and price when it comes to Cloud instances against AWS’s ARM based Graviton 2 and Graviton 3 compute instances by a wide margin.

Disclaimer

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Created At : November 8th 2022, 10:20:36 am

Last Updated At : December 18th 2024, 6:14:03 pm

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | |

This site runs on Ampere Processors.