Recommender Engine AI Inference on AmpereOne

Maximize Performance and Reduce Energy Consumption in AI Deployments

Overview

This workload brief explores the performance of AmpereOne® Cloud-Native Processors in handling AI inference tasks, specifically deep learning recommendation model DLRM_trochbench. We provide insights into how AmpereOne compares to AMD EPYC 9754 (Bergamo) and 9654 (Genoa).

DLRM_torchbench is a benchmarking suite for evaluating the performance of deep learning recommendation models (DLRM) in PyTorch. It focuses on critical metrics like throughput, making it essential for testing hardware performance in AI-driven recommendation systems, which power services like streaming, e-commerce, and social media platforms.

Results and Key Findings

Figures 1 and 2 respectively illustrate the socket-level performance and efficiency of the AmpereOne A192-26X processor compared to AMD EPYC 9754 (Bergamo) and 9654 (Genoa) when running the DLRM (Deep Learning Recommendation Model). The charts highlight two key metrics: performance and performance per watt (normalized to AMD EPYC 9654 as the baseline).

Fig.1: DLRM_torchbench Socket-level Performance

Fig.1: Performance: AmpereOne A192-26X outperforms both AMD processors with a score of 1.15 relative to AMD EPYC 9654 (Genoa) baseline of 1.00.

Fig.2: Socket-level Efficiency

Fig.2: Performance/Watt: AmpereOne demonstrates superior energy efficiency compared to both AMD processors, achieving a performance per watt score of 1.73, compared to AMD EPYC 9654 (Genoa) score of 1.00.

Conclusion

This comparison underscores AmpereOne’s capability to deliver better AI inference performance while maintaining higher energy efficiency, making it a compelling choice in terms of raw performance, cost savings, and meeting ESG goals

About Ampere and AmpereOne

Ampere Computing focuses on delivering high-performance, power-efficient processors for Cloud-Native applications. The AmpereOne processor, with its innovative ARM architecture and up to 192 cores, is designed to meet the demands of modern AI workloads. The integration of Ampere Optimized AI Frameworks (AIO) and Ampere Model Library (AML) further enhances AmpereOne’s AI inference capabilities and facilitates easy transitioning from x86 legacy architecture.

Footnotes

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

©2024 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing^® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com

Created At : September 17th 2024, 4:49:16 pm

Last Updated At : February 11th 2025, 11:48:54 pm

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | |

This site runs on Ampere Processors.