Ampere Computing Logo
Ampere Computing Logo
Solutions Home
Performance Overview
Reference Architecture Overview
Tuning Guides
Workload Briefs Overview
Where to Try
Ampere Systems
Ampere Altra
Google Cloud
Hewlett Packard Enterprise
Tencent Cloud
Ampere AIDownloadsHow It WorksFAQs
Developer CenterDesigning Cloud ApplicationsBuilding Cloud ApplicationsDeploying Cloud ApplicationsUsing Your DataWorking with Open SourceAmpere Ready SoftwareCommunity Forum
Ampere AI Test Notes

Benchmarks were performed with Ampere’s internal testing software based on Ampere Model Library. This software is written entirely in Python and is in-line with MLCommons Inference (a.k.a. MLPerf) methodology of calculating latency and throughput. It utilizes API of frameworks in standard and common ways while replicating usage in real-life applications.

In case of latency benchmarks for each configuration listed below a single system process has been executed at once. Each process, following a warm-up run, has run workloads of batch size equal to 1 in loop for a minimum of 60 seconds. A final latency value has then been calculated on the basis of collected net inference time of each pass through the network.

  • Intel Xeon 8380 “Ice Lake” - number of threads: 1, 4, 16, 32, 64, 80

  • AMD Epyc 7763 “Milan” - number of threads: 1, 4, 16, 32, 64, 128

  • Ampere Altra Max M128-80 – number of threads: 1, 4, 16, 32, 64, 128

When it comes to the multi-process throughput benchmarks a search-space of different batch sizes and number of threads per process has been covered. Final throughput values have been estimated based on average (50th percentile) latencies observed during a 60 second multi process runs. All systems were benchmarked running workloads of following batch sizes per each of n parallel processes: [1, 4, 16, 32, 64, 128, 256]. Number of threads per process to number of processes in total was respectively:

  • Intel Xeon 8380 “Ice Lake” - 1x80, 2x40, 4x20, 16x5, 32x2, 64x1, 80x1

  • AMD Epyc 7763 “Milan” - number of threads: 1x128, 2x64, 4x32, 16x8, 32x4, 64x2, 128x1

  • Ampere Altra Max M128-80 – number of threads: 1x128, 2x64, 4x32, 16x8, 32x4, 64x2, 128x1

Benchmarks of all platforms were run with the use of the same scripting, same datasets, same representation of models. All platforms ran the same workloads, applying identical pre- and post- processing and making uniform inference calls. In the case of fp16 Altra data, values were obtained with the use of same scripting, while AI model representations differed from their fp32 counterparts only in the precision of weights – model quantization process involved only casting to a lower float precision.

Across all systems that were put to the test, TensorFlow library was used in its best-known variant available for a given platform:

  • Intel CPUs – TF 2.7 DNNL, available as Docker hub image: intel/intel-optimized-tensorflow:2.7.0 · AMD CPUs – TF 2.7 Zen-DNN, available at ZenDNN - AMD as

  • AWS Graviton 2nd gen – TF 2.7 (native aarch64 build), available here

  • Ampere Altra Max – TF 2.7 Ampere Optimized, available at Ampere® AI as AIO for Tensorflow

All benchmarks were run with Python 3.8 in Linux-based environments of the following flavors:

  • Intel Xeon 8380 “Ice Lake” - Ubuntu 20.04, kernel: 5.11

  • AMD Epyc 7763 “Milan” - Cent OS 8, kernel: 4.18.0-305.3.1.el8.x86_64

  • Ampere Altra Max M128-80 – Fedora 35, kernel: 5.16.9-200.THP_NO_FIE.fc35.aarch64

Created At : April 14th 2022, 4:09:38 pm
Last Updated At : March 20th 2023, 6:50:33 pm

Ampere Computing

4655 Great America Parkway

Suite 601 Santa Clara, CA 95054

Tel: +1-669-770-3700


© 2023 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.