Roadmap Footnotes

Footnotes:

Data Center Efficiency: Data for the Efficiency claims and Carbon equivalency analysis in the roadmap video (05/18/2023) is based on a composite Web Service study used in the Ampere Efficiency campaign and based on single node performance comparisons measured and published by Ampere Computing. Performance data and the test configurations used to gather the data for each application is published on our web site. The following table shows the composition of a modeled web service based on performance data to determine scale-out behavior through projections and calculations at both Rack and Data center level. Total data center power consumption is based this Web Services study and scaled to 100,000 ft2 data center. Total power difference is then used to complete the Carbon equivalencies. The primary applications used in this analysis are:

Web Tier	Application	Composition Weight	Comparative Performance Info Reference
Front End	NGINX	39%	For Altra Max M128-26
Caching Tier	Memcached	8%	For Altra Max M128-26
Key Value Store	Redis	39%	For Altra Max M128-30
Back End	MYSQL	14%	For Altra Max M128-30

Rack Level evaluation is based on the total performance required to scale out to 1 Rack of power budget for the Ampere^® Altra^® Max processors under the weighted load of the stated application composition above. Rack is based on a standard 42U rack with a total power budget of ~14kW including ~10% overhead buffer for networking, mgmt and PDU. Per server power is measured socket level power during fully loaded operation for each architecture, combined with an equivalent system level overhead typical for motherboard, peripheral and memory power draws. All socket power figures were measured by Ampere during live stress testing. The relative power efficiency ratings can be found at the links provided in the table above.

Data Center level analysis is calculated from the rack level analysis and scaled linearly to fit a medium sized data center specification based approximately on publicly available data for the Bluffdale NSA facility in Bluffdale, UT.(1) The data center modeled is 100k ft2, where 65% of the space is reserved for the server room built on an 8 tile pitch. The total power capacity is roughly 66MW based on a PUE assumption of 1.2. More information on data center rack pitch densities can be found through a variety of publicly available analysis(2)

Carbon equivalencies were calculated using the EPA equivalency calculator.(3)

(1) https://www.npr.org/sections/alltechconsidered/2013/09/23/225381596/booting-up-new-nsa-data-farm-takes-root-in-utah
(2) https://www.racksolutions.com/news/blog/how-many-servers-does-a-data-center-have/
(3) https://www.epa.gov/energy/greenhouse-gases-equivalencies-calculator-calculations-and-references

VMs/Rack: The number of VMs per rack were calculated based on a 42U 16.5kW rack. The load applied is SpecRate Integer 2017 Estimated (SIR) for each system architecture compared. The SIR load total is relevant only to apply a load that exercises each server to a maximum power draw to calculate the number of servers possible within a rack budget of 16.5kW. For each architecture, the total number of servers in the rack is calculated based on single socket 1U servers. The total number of VMs in each server is based on physical core count for each processor summed to obtain the total number of VMs possible per rack. A VM is assumed to own all available threads present for each core. The raw data is shown in the table below:

Architecture	Cores/Server	System Power/Server	Servers/Rack	Cores (VMs)/Rack
AmpereOne	192	434 W	38	7296
Intel SPR 8480	56	534 W	30	1680
AMD Genoa 9654	96	624 W	26	2688

Recommendations/Rack: The number of recommendations (queries) per rack was calculated based on a 42U 14 kW rack. The load applied is Pytorch running the DLRM recommendation model for each system architecture compared. The AI load applied to each server yields a maximum throughput for the DLRM model. Total power drawn for each server was measured at the socket level combined with a typical power for system components per server dived by the rack power budget to obtain the number of servers possible within a rack budget of 14 kW with 10% overhead applied for networking, mgmt and PDU. For each architecture the total number of servers is based on single socket 1U servers. The total performance per rack is a simple sum of performance per server * the total servers/rack. The raw data is show in the table below:

Architecture	Cores/Server	Performance/Server	System Power/Server	Servers/Rack	Performance/Rack
AmpereOne	160	819,750 queries/s	534 W	23	18.85 M Queries/s
AMD Genoa 9654	96	356,388 queries/s	512 W	25	8.91 M Queries/s

AMD 9654 (Genoa)	AmpereOne	DLRM
HW: AMD 9654 (96c, 192t, 1P, 256GB mem) OS: Ubuntu 22.04 Linux kernel: 5.18.11-200.fc36.x86_64 AI SW: AMD ZenDNN PyTorch 1.12.1 - release 4.0.0 - python 3.10/pytorch:1.5.2 release docker image Data Format: FP32	HW: 160c, 1P, 512GB mem OS: Ubuntu 20.04 Linux kernel: 6.1.10-amp01.4k (400W system)/5.18.19-200.fc36.aarch64 AI SW: Ampere Computing AI Data Format: FP16	pytorch implementation based on official facebookresearch/dlrm https://github.com/AmpereComputingAI/dlrm/tree/karol/torchscript model hyperparameters: arch_sparse_feature_size = 64 arch_mlp_bot = "512-512-64" arch_mlp_top = "1024-1024-1024-1" mini_batch_size = 4032 num_batches = 1 num_indicies_per_lookup = 100 ~514M parameters intra threads set to 4 for each parallel process (24 processes on Genoa, 40 on Siryn)

Stable Diffusion Perf/Rack: The number of frames/s per rack was calculated based on a 42U 14 kW rack. The load applied is Pytorch running the Stable Diffusion V2 model for each system architecture compared. The AI load applied to each server yields a maximum throughput for the Stable Diffusion V2 model. Total power drawn for each server was measured at the socket level combined with a typical power for system components per server divided by rack power budget to obtain the number of servers possible within a budget of 14 kW with 10% overhead applied for networking, mgmt and PDU. For each architecture the total number of servers is based on single socket 1U servers. The total performance per rack is a simple sum of performance per server * the total servers/rack. The raw data is show in the table below:

Architecture	Cores/Server	Performance/Server	System Power/Server	Servers/Rack	Performance/Rack
AmpereOne	160	0.036 frames/s	534 W	23	0.828 frames/s
AMD Genoa 9654	96	0.014 frames/s	624 W	26	0.364 frames/s

AMD 9654 (Genoa)	AmpereOne	DLRM
HW: AMD 9654 (96c, 192t, 1P, 256GB mem) OS: Ubuntu 22.04 Linux kernel: 5.18.11-200.fc36.x86_64 AI SW: AMD ZenDNN PyTorch 1.12.1 - release 4.0.0 - python 3.10/pytorch:1.5.2 release docker image Data Format: FP32	HW: 160c, 1P, 512GB mem OS: Ubuntu 20.04 Linux kernel: 6.1.10-amp01.4k (400W system)/5.18.19-200.fc36.aarch64 AI AI SW: Ampere Computing AI Data Format: FP16	Graph torch JIT scripted V2.1 base variant used - 512 ema pruned weights fp32 precision, ~1.3 billion parameters txt 2 image mode 50 sampling steps, batch size of 3, generated image resolution 512x512 num_batches = 1 intra threads set to 16 for each parallel process (6 processes on Genoa, 10 on AmpereOne)

Disclaimer:

All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere^® Computing LLC, and its affiliates (“Ampere^®”), is under no obligation to update or otherwise correct this information. Ampere^® makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere^® is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.

This document is not an offer or a binding commitment by Ampere^®. Use of the products and services contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

This document is not to be used, copied, or reproduced in its entirety, or presented to others without the express written permission of Ampere^®.

The technical data contained herein may be subject to U.S. and international export, re-export, or transfer laws, including “deemed export” laws. Use of these materials contrary to U.S. and international law is strictly prohibited.

© 2023 Ampere^® Computing LLC. All rights reserved. Ampere^®, Ampere^® Computing, Altra and the Ampere^® logo are all trademarks of Ampere^® Computing LLC or its affiliates. SPEC and SPECInt are registered trademarks of the Standard Performance Evaluation Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Created At : May 17th 2023, 5:04:09 pm

Last Updated At : July 26th 2023, 10:02:45 pm

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | |

This site runs on Ampere Processors.