Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Hero Image

 

Footnotes:

Data Center Efficiency: Data for the Efficiency claims and Carbon equivalency analysis in the roadmap video (05/18/2023) is based on a composite Web Service study used in the Ampere Efficiency campaign and based on single node performance comparisons measured and published by Ampere Computing. Performance data and the test configurations used to gather the data for each application is published on our web site. The following table shows the composition of a modeled web service based on performance data to determine scale-out behavior through projections and calculations at both Rack and Data center level. Total data center power consumption is based this Web Services study and scaled to 100,000 ft2 data center. Total power difference is then used to complete the Carbon equivalencies. The primary applications used in this analysis are:

Web TierApplicationComposition WeightComparative Performance Info Reference
Front EndNGINX39%For Altra Max M128-26
Caching TierMemcached8%For Altra Max M128-26
Key Value StoreRedis39%For Altra Max M128-30
Back EndMYSQL14%For Altra Max M128-30

Rack Level evaluation is based on the total performance required to scale out to 1 Rack of power budget for the Ampere® Altra® Max processors under the weighted load of the stated application composition above. Rack is based on a standard 42U rack with a total power budget of ~14kW including ~10% overhead buffer for networking, mgmt and PDU. Per server power is measured socket level power during fully loaded operation for each architecture, combined with an equivalent system level overhead typical for motherboard, peripheral and memory power draws. All socket power figures were measured by Ampere during live stress testing. The relative power efficiency ratings can be found at the links provided in the table above.

Data Center level analysis is calculated from the rack level analysis and scaled linearly to fit a medium sized data center specification based approximately on publicly available data for the Bluffdale NSA facility in Bluffdale, UT.(1) The data center modeled is 100k ft2, where 65% of the space is reserved for the server room built on an 8 tile pitch. The total power capacity is roughly 66MW based on a PUE assumption of 1.2. More information on data center rack pitch densities can be found through a variety of publicly available analysis(2)

Carbon equivalencies were calculated using the EPA equivalency calculator.(3)

(1) https://www.npr.org/sections/alltechconsidered/2013/09/23/225381596/booting-up-new-nsa-data-farm-takes-root-in-utah
(2) https://www.racksolutions.com/news/blog/how-many-servers-does-a-data-center-have/
(3) https://www.epa.gov/energy/greenhouse-gases-equivalencies-calculator-calculations-and-references

VMs/Rack: The number of VMs per rack were calculated based on a 42U 16.5kW rack. The load applied is SpecRate Integer 2017 Estimated (SIR) for each system architecture compared. The SIR load total is relevant only to apply a load that exercises each server to a maximum power draw to calculate the number of servers possible within a rack budget of 16.5kW. For each architecture, the total number of servers in the rack is calculated based on single socket 1U servers. The total number of VMs in each server is based on physical core count for each processor summed to obtain the total number of VMs possible per rack. A VM is assumed to own all available threads present for each core. The raw data is shown in the table below:

ArchitectureCores/ServerSystem Power/ServerServers/RackCores (VMs)/Rack
AmpereOne192434 W387296
Intel SPR 848056534 W301680
AMD Genoa 965496624 W262688

Recommendations/Rack: The number of recommendations (queries) per rack was calculated based on a 42U 14 kW rack. The load applied is Pytorch running the DLRM recommendation model for each system architecture compared. The AI load applied to each server yields a maximum throughput for the DLRM model. Total power drawn for each server was measured at the socket level combined with a typical power for system components per server dived by the rack power budget to obtain the number of servers possible within a rack budget of 14 kW with 10% overhead applied for networking, mgmt and PDU. For each architecture the total number of servers is based on single socket 1U servers. The total performance per rack is a simple sum of performance per server * the total servers/rack. The raw data is show in the table below:

ArchitectureCores/ServerPerformance/ServerSystem Power/ServerServers/RackPerformance/Rack
AmpereOne160819,750 queries/s534 W2318.85 M Queries/s
AMD Genoa 965496356,388 queries/s512 W258.91 M Queries/s


AMD 9654 (Genoa)AmpereOne DLRM
  • HW: AMD 9654 (96c, 192t, 1P, 256GB mem)
  • OS: Ubuntu 22.04
  • Linux kernel: 5.18.11-200.fc36.x86_64
  • AI SW: AMD ZenDNN PyTorch 1.12.1 - release 4.0.0 - python 3.10/pytorch:1.5.2 release docker image
  • Data Format: FP32
  • HW: 160c, 1P, 512GB mem
  • OS: Ubuntu 20.04
  • Linux kernel: 6.1.10-amp01.4k (400W system)/5.18.19-200.fc36.aarch64
  • AI SW: Ampere Computing AI
  • Data Format: FP16
  • pytorch implementation based on official facebookresearch/dlrm
    • https://github.com/AmpereComputingAI/dlrm/tree/karol/torchscript
  • model hyperparameters:
    • arch_sparse_feature_size = 64
    • arch_mlp_bot = "512-512-64"
    • arch_mlp_top = "1024-1024-1024-1"
    • mini_batch_size = 4032
    • num_batches = 1
    • num_indicies_per_lookup = 100
  • ~514M parameters
  • intra threads set to 4 for each parallel process (24 processes on Genoa, 40 on Siryn)


Stable Diffusion Perf/Rack: The number of frames/s per rack was calculated based on a 42U 14 kW rack. The load applied is Pytorch running the Stable Diffusion V2 model for each system architecture compared. The AI load applied to each server yields a maximum throughput for the Stable Diffusion V2 model. Total power drawn for each server was measured at the socket level combined with a typical power for system components per server divided by rack power budget to obtain the number of servers possible within a budget of 14 kW with 10% overhead applied for networking, mgmt and PDU. For each architecture the total number of servers is based on single socket 1U servers. The total performance per rack is a simple sum of performance per server * the total servers/rack. The raw data is show in the table below:

ArchitectureCores/ServerPerformance/ServerSystem Power/ServerServers/RackPerformance/Rack
AmpereOne1600.036 frames/s534 W230.828 frames/s
AMD Genoa 9654960.014 frames/s624 W260.364 frames/s


AMD 9654 (Genoa)AmpereOne DLRM
  • HW: AMD 9654 (96c, 192t, 1P, 256GB mem)
  • OS: Ubuntu 22.04
  • Linux kernel: 5.18.11-200.fc36.x86_64
  • AI SW: AMD ZenDNN PyTorch 1.12.1 - release 4.0.0 - python 3.10/pytorch:1.5.2 release docker image
  • Data Format: FP32
  • HW: 160c, 1P, 512GB mem
  • OS: Ubuntu 20.04
  • Linux kernel: 6.1.10-amp01.4k (400W system)/5.18.19-200.fc36.aarch64 AI
  • AI SW: Ampere Computing AI
  • Data Format: FP16
  • Graph torch JIT scripted 
  • V2.1 base variant used - 512 ema pruned weights
  • fp32 precision, ~1.3 billion parameters
  • txt 2 image mode
  • 50 sampling steps, batch size of 3, generated image resolution 512x512
  • num_batches = 1
  • intra threads set to 16 for each parallel process (6 processes on Genoa, 10 on AmpereOne)

Disclaimer:

All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere® Computing LLC, and its affiliates (“Ampere®”), is under no obligation to update or otherwise correct this information. Ampere® makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere® is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.

This document is not an offer or a binding commitment by Ampere®. Use of the products and services contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

This document is not to be used, copied, or reproduced in its entirety, or presented to others without the express written permission of Ampere®.

The technical data contained herein may be subject to U.S. and international export, re-export, or transfer laws, including “deemed export” laws. Use of these materials contrary to U.S. and international law is strictly prohibited.

© 2023 Ampere® Computing LLC. All rights reserved. Ampere®, Ampere® Computing, Altra and the Ampere® logo are all trademarks of Ampere® Computing LLC or its affiliates. SPEC and SPECInt are registered trademarks of the Standard Performance Evaluation Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Created At : May 17th 2023, 5:04:09 pm
Last Updated At : July 26th 2023, 10:02:45 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
image
 |  |  | 
© 2024 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site runs on Ampere Processors.