Roadmap 2024 Footnotes

Footnotes:

Global Data Center Energy Consumption figures in accordance with International Energy Agency (IEA) “Electricity 2024 – Analysis and forecast to 2024” report and corresponding Data Center Frontier article “IEA Study Sees AI, Cryptocurrency Doubling Data Center Energy Consumption by 2026”.

Full Data Center Fronter article accessible here
Full IEA report accessible here

Claims around the percentage of coal, oil and natural gas as sources for the global energy supply based on IEA “World Energy outlook 2023”.

Claims around the percentage make-up of AI compute cycles based on the “AI Semis Market Landscape” study by D2D Advisory Inc.

Meta Llama 3 Performance per Dollar based on Ampere® testing of Llama 3 8B Q2 (pp128, batch size = 1). Ampere performance test completed on an Oracle A1 Flex Instance. VM configuration details: 80 vCPUs, 3.0GHz, 512 GiB DDR4 3200 MHz, Linux kernel 5.15.0-1051-oracle. Instance cost for OCI A1 instance is $1.568. Ampere performance: 85.6 tokens per second (using Ampere-optimized llama cpp release 1.2.0). NVIDIA® A10 performance testing completed on an Oracle Cloud (1 instance of VM.GPU.A10.1, Linux kernel 5.15.0-1045-oracle, CUDA=12.2, driver=535.146.02). Instance cost for OCI NVIDIA A10 instance (VM.GPU.A10.1) is $2.00/hr. NVIDIA A10 performance: 78.5 tokens per second (using Tensor RT-LLM).

Llama 3 Performance per Watt and Power based on Ampere Computing testing of Llama 3 8B Q2 (pp128, batch size = 1). Ampere performance test completed on bare metal Ampere® Altra® Max powered server: 1 x M128-30, 512 GiB DDR4 3200 MHz, Linux kernel 6.4.13-200.fc38.aarch64. Ampere performance: 78.9 tokens per second (Ampere-optimized llama cpp release 1.2.0). Processor power draw under load for Ampere Altra Max M128-30 measured at 140W. Calculated performance per watt = 0.564. NVIDIA A10 (150W TDP) performance testing completed on an Oracle Cloud (1 instance of VM.GPU.A10.1, Linux kernel 5.15.0-1045-oracle, CUDA=12.2, driver=535.146.02). Processor power calculations assume Intel® Xeon® Scalable 6430 (270W TDP) as CPU host. Performance per Watt calculations for NVIDIA A10 made up combining 1 x GPU and 1 x CPU TDP (420W total). NVIDIA A10 performance: 78.5 tokens per second (using Tensor RT-LLM). Calculated performance per watt = 0.187.

NVIDIA DGX estimated price of $411,228 based on average pricing obtained from 4 NVIDIA partner published prices 3/11/2024. Max utilization power of 10.2kW for DGX H100 from NVIDIA spec sheet located here. Performance estimated based on internal testing conducted by Ampere Computing in Q4 of 2023 using OpenAI PyTorch whisper-medium Machine Learning speech to text inferences/second. Max utilization power of 341 for Ampere powered systems based on max utilization power for single system (1) Ampere Altra Max M128-30 configured with 8x16GiB DIMMs, redundant power, and 2 HD obtained from OEM(s) power. Performance claims comparing Ampere to NVIDIA DGX based on single system performance scaled-out to the rack-level and then the data center level. Rack size 40U, rack power budget of 15kW. Data center size 60 racks, data center power budget 1MW, data center PUE 1.5. Given rack and data center constraints, up to 2,400 Ampere Altra servers can be deployed in the data center. Given rack and data center constraints, up to 60 NVIDIA DGX H100 systems can be deployed in the data center. Single system Whisper performance: 375,518 offline tps per Ampere Altra Max M128-30 single socket node and 3,164,020 per DGX H100 with 8x NVIDIA H100 80GB assuming optimum linear performance scaling from 1 to 8 NVIDIA H100 GPUs.

Performance per Rack: Rack is based on 42U rack with 12.5kW power budget. 2U and 1.0kW allocated as buffer for networking, management and PDU. Total performance per rack calculated by multiplying the performance per server with the maximum number of servers that fit in a rack (until space or power constraints are reached).

Hardware Configurations:

Hardware	OS	Kernel
1 x Ampere Altra Max M128-30 (128c/128t, 3.0GHz), 8 x 64GiB DDR4 3200 MHz	CentOS 8.0.1905	6.3.13-200.fc38.aarch64
1 x AmpereOne A192-32X (192c/192t, 3.2GHz) or AmpereOne A192-26x (192c/192t, 2.6GHz), 8 x 64 GiB DDR5 5200 MHz	Fedora 38	6.4.13-200.fc38.aarch64
1 x AMD EPYC 9654 (96c/192t, 2.4/3.55GHz), 12 x 64 GiB DDR5 4800 MHz	Fedora 38	6.4.13-200.fc38.x86_64
1 x AMD EPYC 9754 (128c/256t, 2.25/3.1GHz), 12 x 64 GiB DDR5 4800 MHz	Fedora 38	6.4.13-200.fc38.x86_64

Server Usage Power: All CPU power draw figures are based on Ampere-performed lab tests under load (for each referenced application). In order to calculate server usage power, platform power draw is added on top of CPU power draw. Platform power assumptions informed by three leading OEM server power calculator tools.

Platform Power Assumptions:

Component	Description	Total Power Draw
Storage	4 x NVMe (10W ea)	40W
Networking	1 x 1GbE OCP NIC, 1 x 10/25GbE NIC, 1 x 100GbE NIC	40W
Other	Motherboard, Fans, Misc	96W
Memory	8 ch DDR4	56W
	8 ch DDR5	80W
	12 ch DDR5	120W

SPECint 2017 performance: All SPECrate®2017_int_base performance estimates for AMD and Ampere platforms are based on GCC (10/13 compiler). See details in below table. Rack level estimates based on 1U server height.

Processors Under Test	SPECrate®2017_int_base score (estimated)	CPU Usage Power (W)	Performance / Watt (calculated)	Compiler
Ampere Altra Max M128-30	359	178	2.02	Community GCC 10.2
AmpereOne A192-26X	616	212	2.91	Community GCC 13.2
AmpereOne A192-32X	694	274	2.53	Community GCC 13.2
AMD Genoa 9654	638	379	1.68	Community GCC 13.2
AMD Bergamo 9754	733	333	2.20	Community GCC 13.2

SPECint 2017 Rack Level Calculations: SPECrate®2017_int_base

Hardware	# Servers	Power Draw (W)	Rack Performance
AmpereOne A192-32x	21	11,133W	14,574
AMD Genoa 9654	17	11,478W	10,846
AMD Bergamo 9754	18	11,325W	13,194

Containerized Web Service: Rack claims based on equal weight per application: 1 full rack of AmpereOne servers. AmpereOne performance per rack calculated by multiplying the performance per single server with the maximum number of servers that fit in 1 full rack (until space or power constraints are reached). 1U and 2U servers used for A192-26 platforms, whereas 2U servers are used for A192-32 and AMD Genoa and Bergamo in this calculation.

Processor Under Test	NGINX Performance	Redis Performance	Memcached Performance	MySQL Performance	Compiler
AmpereOne A192-26X	139,989	n/a	n/a	n/a	Community GCC 13.2
AmpereOne A192-32X	173,929	174,560,342	94,338,373	405,568	Community GCC 13.2
AMD Genoa 9654	136,298	140,356,506	89,174,901	320,436	Community GCC 13.2
AMD Bergamo 9754	128,504	164,139,942	99,545,761	315,934	Community GCC 13.2

AmpereOne application rack performance used as baseline to calculate the # of AMD Genoa and AMD Bergamo systems and power required to match AmpereOne rack performance. Total power draw (all applications combined) used to calculate the total required rack count for AMD Geno and AMD Bergamo.

CPU Usage Power (W) Web Services Applications	NGINX	Redis	Memcached	MySQL
AmpereOne A192-26X	285	n/a	n/a	n/a
AmpereOne A192-32X	385	312	266	265
AMD Genoa 9654	410	410	410	401
AMD Bergamo 9754	369	386	376	322

Web Service Composite

All server-level performance and power draw claims are based on Ampere Computing LLC internal lab testing.

Disclaimer

All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere® Computing LLC, and its affiliates (“Ampere®”), is under no obligation to update or otherwise correct this information. Ampere® makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere® is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.

This document is not an offer or a binding commitment by Ampere^®. Use of the products and services contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

This document is not to be used, copied, or reproduced in its entirety, or presented to others without the express written permission of Ampere^®.

The technical data contained herein may be subject to U.S. and international export, re-export, or transfer laws, including “deemed export” laws. Use of these materials contrary to U.S. and international law is strictly prohibited.

© 2024 Ampere® Computing LLC. All rights reserved. Ampere®, Ampere® Computing, Altra®, AmpereOne® and the Ampere® logo are all trademarks of Ampere® Computing LLC or its affiliates. SPEC and SPECInt are registered trademarks of the Standard Performance Evaluation Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Created At : May 8th 2024, 7:19:10 pm

Last Updated At : July 19th 2024, 9:43:58 pm

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | |

This site runs on Ampere Processors.