AI Inference on Azure Solution Brief
Dpsv5 Virtual Machines Powered by Ampere Altra Processors
Ampere® Altra® processors are designed to deliver exceptional performance for Cloud Native applications such as AI Inference. With an innovative architecture that delivers predictable high performance, linear scaling, and high energy efficiency, Ampere Altra allows workloads to run in a predictable manner with minimal variance under increasing loads. This enables industry leading performance/watt performance and a smaller carbon footprint. You can now run AI inference workloads with both industry leading performance and energy efficiency.
Microsoft offers a comprehensive line of Azure Virtual Machines featuring the Ampere Altra Cloud Native processor that can run a diverse and broad set of scale-out workloads such as web servers, open-source databases, in-memory applications, big data analytics, gaming, media, and more. The Dpsv5 VMs are general-purpose VMs that provide 2 GB of memory per vCPU and a combination of vCPUs, memory, and local storage to cost-effectively run workloads that do not require larger amounts of RAM per vCPU. The Epsv5 VMs are memory-optimized VMs that provide 4 GB of memory per vCPU, which can benefit memory-intensive workloads, including open-source databases, in-memory caching applications, gaming, and data analytics engines.
MLPerf™ Inference is a benchmark suite consisting of carefully selected AI architectures that represent the forefront of today’s artificial intelligence. It is a comprehensive test of how well a given system performs on a variety of representative machine learning tasks, including natural language processing, computer vision, recommendation engines, and more. It is a result of a consensus on the best benchmarking techniques forged by experts in architecture, systems, and machine learning.
Ampere Altra VMs offer great performance on a variety of AI workloads, including the models in the MLPerf Inference benchmark. ResNet-50 v1.5 is a popular neural network architecture primarily used in the field of computer vision. This model, trained to perform well on ImageNet class prediction task, is part of the MLPerf Inference suite. We are running an MLPerf-like benchmarking script that measures the performance of model inference without any internal conversion to proprietary formats. This provides unbiased comparisons of performance across architectures while running the same neural network.
Ampere Altra-based Dpsv5 VMs are only cloud CPU instances on Azure that natively support FP16 vectorized computation. FP16 can deliver up to a 2x performance gain over FP32 without sacrificing model accuracy. Ampere optimized TensorFlow takes full advantage of FP16 to deliver the best performance and price-performance over legacy x86 VMs.
In the single-stream scenario, which measures the 99th percentile latency of processing a single input image, the Dps5 VM performed 36% better than the Intel Ice Lake-based Dsv5 VMs and 2.6x better than the AMD Milan-based
Dasv5 VMs, as shown in Figure 1. On price-performance, Figure 2 shows the results - the Ampere Altra-based Dpsv5 VMs had a 68% and 2.9x advantage over the Dsv5 and Dasv5 VMs.
In the offline scenario – measuring the maximum throughput of the system (number of processed inputs in a fixed unit of time) without latency constraints – the Ampere Altra-based Dpsv5 VM came out on top – 11% more performant than the Dsv5 VM and 2.1x compared to the Dasv5 VM as shown in Figure 3.
On price-performance, as shown in Figure 4, the Ampere Altra-based Dpsv5 VM was 39% more cost-efficient than the Dsv5 VM and 2.3x compared to the Dasv5 VM.
The results in this workload brief are based on measurements with the Ampere Model Library (AML) for D16ps v5, D16s v5, and D16as v5 VMs.
Price-performance data is based on Azure on-demand pricing in the Iowa region as of July 12, 2022.
AI Inference is rapidly growing as a workload in the cloud. Ampere optimized frameworks (TensorFlow, PyTorch, and ONNX Runtime) provide the best-in-class Inference performance for a variety of AI models such as computer vision, natural language processing, and recommendation engines. Popular computer vision models such as ResNet-50 have been studied on several Azure VMs. In our tests, the Microsoft Azure Dpsv5 VMs powered by the Ampere Altra Cloud Native processors and Ampere optimized TensorFlow delivered remarkably better Inference performance and price-performance than legacy x86 VMs. Overall, great performance and compelling price-performance, all while reducing your carbon footprint. For more information about Azure Virtual Machines with Ampere Altra Arm-based processors, visit the Azure blog.
All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.
Price performance was calculated using Microsoft's Virtual Machines Pricing, in September of 2022. Refer to individual tests for more information.
©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com
4655 Great America Parkway
Suite 601 Santa Clara, CA 95054