Benchmarks were performed with Ampere’s internal testing software based on Ampere Model Library. This software is written entirely in Python and is in-line with MLCommons Inference (a.k.a. MLPerf) methodology of calculating throughput. It utilizes API of frameworks in standard and common ways while replicating usage in real-life applications.
A search-space of different batch sizes and number of threads per process has been covered. Final throughput values have been estimated based on average (50th percentile) latencies observed during a 60 second multi process runs. All systems were benchmarked running computer vision and NLP workloads of following batch sizes per each of n parallel processes: [1, 4, 16, 32, 64, 128, 256] and in the case of recommender engine workloads: [1024, 2048, 4096, 8192, 16384 and 32768]. Number of threads per process to number of processes in total was respectively:
Benchmarks of all platforms were run with the use of the same scripting, same datasets, same representation of models. All platforms ran the same workloads, applying identical pre- and post- processing and making uniform inference calls. In the case of fp16 Altra data, values were obtained with the use of same scripting, while AI model representations differed from their fp32 counterparts only in the precision of weights – model quantization process involved only casting to a lower float precision.
Across all systems that were put to the test, TensorFlow library was used in its best-known variant available for a given platform:
Intel CPUs – intel-optimized-tensorflow:2.12.0-pip-base and intel-optimized-pytorch: 2.0.0-pip
Ampere Altra Max – amperecomputingai/tensorflow:1.6.0 (tf 2.11) amperecomputingai/pytorch:1.6.0 (pytorch 2.0.0)
All benchmarks were run with Python 3.8 in Linux-based environments of the following flavors:
All benchmarks were run with Python 3.8 in Linux-based environments of the following flavors:
Intel Xeon 8380 “Ice Lake” - Ubuntu 20.04, kernel: 5.11
AMD Epyc 7763 “Milan” - Cent OS 8, kernel: 4.18.0-305.3.1.el8.x86_64
Ampere Altra Max M128-80 – Fedora 35, kernel: 5.16.9-200.THP_NO_FIE.fc35.aarch64