Roadmap 2024 Footnotes
Global Data Center Energy Consumption figures in accordance with International Energy Agency (IEA) “Electricity 2024 – Analysis and forecast to 2024” report and corresponding Data Center Frontier article “IEA Study Sees AI, Cryptocurrency Doubling Data Center Energy Consumption by 2026”.
Claims around the percentage of coal, oil and natural gas as sources for the global energy supply based on IEA “World Energy outlook 2023”.
Claims around the percentage make-up of AI compute cycles based on the “AI Semis Market Landscape” study by D2D Advisory Inc.
Meta Llama 3 Performance per Dollar based on Ampere® testing of Llama 3 8B Q2 (pp128, batch size = 1). Ampere performance test completed on an Oracle A1 Flex Instance. VM configuration details: 80 vCPUs, 3.0GHz, 512 GiB DDR4 3200 MHz, Linux kernel 5.15.0-1051-oracle. Instance cost for OCI A1 instance is $1.568. Ampere performance: 85.6 tokens per second (using Ampere-optimized llama cpp release 1.2.0). NVIDIA® A10 performance testing completed on an Oracle Cloud (1 instance of VM.GPU.A10.1, Linux kernel 5.15.0-1045-oracle, CUDA=12.2, driver=535.146.02). Instance cost for OCI NVIDIA A10 instance (VM.GPU.A10.1) is $2.00/hr. NVIDIA A10 performance: 78.5 tokens per second (using Tensor RT-LLM).
Llama 3 Performance per Watt and Power based on Ampere Computing testing of Llama 3 8B Q2 (pp128, batch size = 1). Ampere performance test completed on bare metal Ampere® Altra® Max powered server: 1 x M128-30, 512 GiB DDR4 3200 MHz, Linux kernel 6.4.13-200.fc38.aarch64. Ampere performance: 78.9 tokens per second (Ampere-optimized llama cpp release 1.2.0). Processor power draw under load for Ampere Altra Max M128-30 measured at 140W. Calculated performance per watt = 0.564. NVIDIA A10 (150W TDP) performance testing completed on an Oracle Cloud (1 instance of VM.GPU.A10.1, Linux kernel 5.15.0-1045-oracle, CUDA=12.2, driver=535.146.02). Processor power calculations assume Intel® Xeon® Scalable 6430 (270W TDP) as CPU host. Performance per Watt calculations for NVIDIA A10 made up combining 1 x GPU and 1 x CPU TDP (420W total). NVIDIA A10 performance: 78.5 tokens per second (using Tensor RT-LLM). Calculated performance per watt = 0.187.
NVIDIA DGX estimated price of $411,228 based on average pricing obtained from 4 NVIDIA partner published prices 3/11/2024. Max utilization power of 10.2kW for DGX H100 from NVIDIA spec sheet located here. Performance estimated based on internal testing conducted by Ampere Computing in Q4 of 2023 using OpenAI PyTorch whisper-medium Machine Learning speech to text inferences/second. Max utilization power of 341 for Ampere powered systems based on max utilization power for single system (1) Ampere Altra Max M128-30 configured with 8x16GiB DIMMs, redundant power, and 2 HD obtained from OEM(s) power. Performance claims comparing Ampere to NVIDIA DGX based on single system performance scaled-out to the rack-level and then the data center level. Rack size 40U, rack power budget of 15kW. Data center size 60 racks, data center power budget 1MW, data center PUE 1.5. Given rack and data center constraints, up to 2,400 Ampere Altra servers can be deployed in the data center. Given rack and data center constraints, up to 60 NVIDIA DGX H100 systems can be deployed in the data center. Single system Whisper performance: 375,518 offline tps per Ampere Altra Max M128-30 single socket node and 3,164,020 per DGX H100 with 8x NVIDIA H100 80GB assuming optimum linear performance scaling from 1 to 8 NVIDIA H100 GPUs.
Performance per Rack: Rack is based on 42U rack with 12.5kW power budget. 2U and 1.0kW allocated as buffer for networking, management and PDU. Total performance per rack calculated by multiplying the performance per server with the maximum number of servers that fit in a rack (until space or power constraints are reached).
Hardware | OS | Kernel |
---|---|---|
1 x Ampere Altra Max M128-30 (128c/128t, 3.0GHz), 8 x 64GiB DDR4 3200 MHz | CentOS 8.0.1905 | 6.3.13-200.fc38.aarch64 |
1 x AmpereOne A192-32X (192c/192t, 3.2GHz) or AmpereOne A192-26x (192c/192t, 2.6GHz), 8 x 64 GiB DDR5 5200 MHz | Fedora 38 | 6.4.13-200.fc38.aarch64 |
1 x AMD EPYC 9654 (96c/192t, 2.4/3.55GHz), 12 x 64 GiB DDR5 4800 MHz | Fedora 38 | 6.4.13-200.fc38.x86_64 |
1 x AMD EPYC 9754 (128c/256t, 2.25/3.1GHz), 12 x 64 GiB DDR5 4800 MHz | Fedora 38 | 6.4.13-200.fc38.x86_64 |
Server Usage Power: All CPU power draw figures are based on Ampere-performed lab tests under load (for each referenced application). In order to calculate server usage power, platform power draw is added on top of CPU power draw. Platform power assumptions informed by three leading OEM server power calculator tools.
Component | Description | Total Power Draw |
---|---|---|
Storage | 4 x NVMe (10W ea) | 40W |
Networking | 1 x 1GbE OCP NIC, 1 x 10/25GbE NIC, 1 x 100GbE NIC | 40W |
Other | Motherboard, Fans, Misc | 96W |
Memory | 8 ch DDR4 | 56W |
8 ch DDR5 | 80W | |
12 ch DDR5 | 120W |
SPECint 2017 performance: All SPECrate®2017_int_base performance estimates for AMD and Ampere platforms are based on GCC (10/13 compiler). See details in below table. Rack level estimates based on 1U server height.
Processors Under Test | SPECrate®2017_int_base score (estimated) | CPU Usage Power (W) | Performance / Watt (calculated) | Compiler |
---|---|---|---|---|
Ampere Altra Max M128-30 | 359 | 178 | 2.02 | Community GCC 10.2 |
AmpereOne A192-26X | 616 | 212 | 2.91 | Community GCC 13.2 |
AmpereOne A192-32X | 694 | 274 | 2.53 | Community GCC 13.2 |
AMD Genoa 9654 | 638 | 379 | 1.68 | Community GCC 13.2 |
AMD Bergamo 9754 | 733 | 333 | 2.20 | Community GCC 13.2 |
Hardware | # Servers | Power Draw (W) | Rack Performance |
---|---|---|---|
AmpereOne A192-32x | 21 | 11,133W | 14,574 |
AMD Genoa 9654 | 17 | 11,478W | 10,846 |
AMD Bergamo 9754 | 18 | 11,325W | 13,194 |
Containerized Web Service: Rack claims based on equal weight per application: 1 full rack of AmpereOne servers. AmpereOne performance per rack calculated by multiplying the performance per single server with the maximum number of servers that fit in 1 full rack (until space or power constraints are reached). 1U and 2U servers used for A192-26 platforms, whereas 2U servers are used for A192-32 and AMD Genoa and Bergamo in this calculation.
Processor Under Test | NGINX Performance | Redis Performance | Memcached Performance | MySQL Performance | Compiler |
---|---|---|---|---|---|
AmpereOne A192-26X | 139,989 | n/a | n/a | n/a | Community GCC 13.2 |
AmpereOne A192-32X | 173,929 | 174,560,342 | 94,338,373 | 405,568 | Community GCC 13.2 |
AMD Genoa 9654 | 136,298 | 140,356,506 | 89,174,901 | 320,436 | Community GCC 13.2 |
AMD Bergamo 9754 | 128,504 | 164,139,942 | 99,545,761 | 315,934 | Community GCC 13.2 |
AmpereOne application rack performance used as baseline to calculate the # of AMD Genoa and AMD Bergamo systems and power required to match AmpereOne rack performance. Total power draw (all applications combined) used to calculate the total required rack count for AMD Geno and AMD Bergamo.
CPU Usage Power (W) Web Services Applications | NGINX | Redis | Memcached | MySQL |
---|---|---|---|---|
AmpereOne A192-26X | 285 | n/a | n/a | n/a |
AmpereOne A192-32X | 385 | 312 | 266 | 265 |
AMD Genoa 9654 | 410 | 410 | 410 | 401 |
AMD Bergamo 9754 | 369 | 386 | 376 | 322 |
All server-level performance and power draw claims are based on Ampere Computing LLC internal lab testing.
All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere® Computing LLC, and its affiliates (“Ampere®”), is under no obligation to update or otherwise correct this information. Ampere® makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere® is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.
This document is not an offer or a binding commitment by Ampere®. Use of the products and services contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
This document is not to be used, copied, or reproduced in its entirety, or presented to others without the express written permission of Ampere®.
The technical data contained herein may be subject to U.S. and international export, re-export, or transfer laws, including “deemed export” laws. Use of these materials contrary to U.S. and international law is strictly prohibited.
© 2024 Ampere® Computing LLC. All rights reserved. Ampere®, Ampere® Computing, Altra®, AmpereOne® and the Ampere® logo are all trademarks of Ampere® Computing LLC or its affiliates. SPEC and SPECInt are registered trademarks of the Standard Performance Evaluation Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.