Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Hero Image

Roadmap 2024 Footnotes

 

Footnotes:

Global Data Center Energy Consumption figures in accordance with International Energy Agency (IEA) “Electricity 2024 – Analysis and forecast to 2024” report and corresponding Data Center Frontier article “IEA Study Sees AI, Cryptocurrency Doubling Data Center Energy Consumption by 2026”.

  • Full Data Center Fronter article accessible here
  • Full IEA report accessible here

Claims around the percentage of coal, oil and natural gas as sources for the global energy supply based on IEA “World Energy outlook 2023”.


Claims around the percentage make-up of AI compute cycles based on the “AI Semis Market Landscape” study by D2D Advisory Inc.


Meta Llama 3 Performance per Dollar based on Ampere® testing of Llama 3 8B Q2 (pp128, batch size = 1). Ampere performance test completed on an Oracle A1 Flex Instance. VM configuration details: 80 vCPUs, 3.0GHz, 512 GiB DDR4 3200 MHz, Linux kernel 5.15.0-1051-oracle. Instance cost for OCI A1 instance is $1.568. Ampere performance: 85.6 tokens per second (using Ampere-optimized llama cpp release 1.2.0). NVIDIA® A10 performance testing completed on an Oracle Cloud (1 instance of VM.GPU.A10.1, Linux kernel 5.15.0-1045-oracle, CUDA=12.2, driver=535.146.02). Instance cost for OCI NVIDIA A10 instance (VM.GPU.A10.1) is $2.00/hr. NVIDIA A10 performance: 78.5 tokens per second (using Tensor RT-LLM).


Llama 3 Performance per Watt and Power based on Ampere Computing testing of Llama 3 8B Q2 (pp128, batch size = 1). Ampere performance test completed on bare metal Ampere® Altra® Max powered server: 1 x M128-30, 512 GiB DDR4 3200 MHz, Linux kernel 6.4.13-200.fc38.aarch64. Ampere performance: 78.9 tokens per second (Ampere-optimized llama cpp release 1.2.0). Processor power draw under load for Ampere Altra Max M128-30 measured at 140W. Calculated performance per watt = 0.564. NVIDIA A10 (150W TDP) performance testing completed on an Oracle Cloud (1 instance of VM.GPU.A10.1, Linux kernel 5.15.0-1045-oracle, CUDA=12.2, driver=535.146.02). Processor power calculations assume Intel® Xeon® Scalable 6430 (270W TDP) as CPU host. Performance per Watt calculations for NVIDIA A10 made up combining 1 x GPU and 1 x CPU TDP (420W total). NVIDIA A10 performance: 78.5 tokens per second (using Tensor RT-LLM). Calculated performance per watt = 0.187.

NVIDIA DGX estimated price of $411,228 based on average pricing obtained from 4 NVIDIA partner published prices 3/11/2024. Max utilization power of 10.2kW for DGX H100 from NVIDIA spec sheet located here. Performance estimated based on internal testing conducted by Ampere Computing in Q4 of 2023 using OpenAI PyTorch whisper-medium Machine Learning speech to text inferences/second. Max utilization power of 341 for Ampere powered systems based on max utilization power for single system (1) Ampere Altra Max M128-30 configured with 8x16GiB DIMMs, redundant power, and 2 HD obtained from OEM(s) power. Performance claims comparing Ampere to NVIDIA DGX based on single system performance scaled-out to the rack-level and then the data center level. Rack size 40U, rack power budget of 15kW. Data center size 60 racks, data center power budget 1MW, data center PUE 1.5. Given rack and data center constraints, up to 2,400 Ampere Altra servers can be deployed in the data center. Given rack and data center constraints, up to 60 NVIDIA DGX H100 systems can be deployed in the data center. Single system Whisper performance: 375,518 offline tps per Ampere Altra Max M128-30 single socket node and 3,164,020 per DGX H100 with 8x NVIDIA H100 80GB assuming optimum linear performance scaling from 1 to 8 NVIDIA H100 GPUs.


Performance per Rack: Rack is based on 42U rack with 12.5kW power budget. 2U and 1.0kW allocated as buffer for networking, management and PDU. Total performance per rack calculated by multiplying the performance per server with the maximum number of servers that fit in a rack (until space or power constraints are reached).


Hardware Configurations:

HardwareOSKernel
1 x Ampere Altra Max M128-30 (128c/128t, 3.0GHz), 8 x 64GiB DDR4 3200 MHzCentOS 8.0.19056.3.13-200.fc38.aarch64
1 x AmpereOne A192-32X (192c/192t, 3.2GHz) or AmpereOne A192-26x (192c/192t, 2.6GHz), 8 x 64 GiB DDR5 5200 MHzFedora 386.4.13-200.fc38.aarch64
1 x AMD EPYC 9654 (96c/192t, 2.4/3.55GHz), 12 x 64 GiB DDR5 4800 MHzFedora 386.4.13-200.fc38.x86_64
1 x AMD EPYC 9754 (128c/256t, 2.25/3.1GHz), 12 x 64 GiB DDR5 4800 MHzFedora 386.4.13-200.fc38.x86_64

Server Usage Power: All CPU power draw figures are based on Ampere-performed lab tests under load (for each referenced application). In order to calculate server usage power, platform power draw is added on top of CPU power draw. Platform power assumptions informed by three leading OEM server power calculator tools.



Platform Power Assumptions:

ComponentDescriptionTotal Power Draw​
Storage4 x NVMe (10W ea)40W
Networking1 x 1GbE OCP NIC, 1 x 10/25GbE NIC, 1 x 100GbE NIC40W
OtherMotherboard, Fans, Misc96W
Memory8 ch DDR456W
8 ch DDR580W
12 ch DDR5120W

SPECint 2017 performance: All SPECrate®2017_int_base performance estimates for AMD and Ampere platforms are based on GCC (10/13 compiler). See details in below table. Rack level estimates based on 1U server height.

Processors Under TestSPECrate®2017_int_base score (estimated)CPU Usage Power (W)Performance / Watt (calculated)Compiler
Ampere Altra Max M128-303591782.02Community GCC 10.2
AmpereOne A192-26X6162122.91Community GCC 13.2
AmpereOne A192-32X 694 274 2.53 Community GCC 13.2
AMD Genoa 9654 638 379 1.68 Community GCC 13.2
AMD Bergamo 9754 733 333 2.20 Community GCC 13.2

SPECint 2017 Rack Level Calculations: SPECrate®2017_int_base

Hardware# ServersPower Draw (W)Rack Performance
AmpereOne A192-32x2111,133W14,574
AMD Genoa 96541711,478W10,846
AMD Bergamo 97541811,325W13,194

Containerized Web Service: Rack claims based on equal weight per application: 1 full rack of AmpereOne servers. AmpereOne performance per rack calculated by multiplying the performance per single server with the maximum number of servers that fit in 1 full rack (until space or power constraints are reached). 1U and 2U servers used for A192-26 platforms, whereas 2U servers are used for A192-32 and AMD Genoa and Bergamo in this calculation.

Processor Under TestNGINX PerformanceRedis PerformanceMemcached PerformanceMySQL PerformanceCompiler
AmpereOne A192-26X139,989n/an/an/aCommunity GCC 13.2
AmpereOne A192-32X173,929174,560,34294,338,373405,568Community GCC 13.2
AMD Genoa 9654136,298140,356,50689,174,901320,436Community GCC 13.2
AMD Bergamo 9754128,504164,139,94299,545,761315,934Community GCC 13.2

AmpereOne application rack performance used as baseline to calculate the # of AMD Genoa and AMD Bergamo systems and power required to match AmpereOne rack performance. Total power draw (all applications combined) used to calculate the total required rack count for AMD Geno and AMD Bergamo.


CPU Usage Power (W) Web Services ApplicationsNGINXRedisMemcachedMySQL
AmpereOne A192-26X285n/an/an/a
AmpereOne A192-32X385312266265
AMD Genoa 9654410410410401
AMD Bergamo 9754369386376322

Web Service Composite

All server-level performance and power draw claims are based on Ampere Computing LLC internal lab testing. 

Disclaimer

All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere® Computing LLC, and its affiliates (“Ampere®”), is under no obligation to update or otherwise correct this information. Ampere® makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere® is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.

This document is not an offer or a binding commitment by Ampere®. Use of the products and services contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

This document is not to be used, copied, or reproduced in its entirety, or presented to others without the express written permission of Ampere®.

The technical data contained herein may be subject to U.S. and international export, re-export, or transfer laws, including “deemed export” laws. Use of these materials contrary to U.S. and international law is strictly prohibited.

© 2024 Ampere® Computing LLC. All rights reserved. Ampere®, Ampere® Computing, Altra®, AmpereOne® and the Ampere® logo are all trademarks of Ampere® Computing LLC or its affiliates. SPEC and SPECInt are registered trademarks of the Standard Performance Evaluation Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Created At : May 8th 2024, 7:19:10 pm
Last Updated At : July 19th 2024, 9:43:58 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
image
 |  |  | 
© 2024 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site runs on Ampere Processors.